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ONS 

FIELD OF T^g Ti 

The present invention relates generally to the 
cystic fibrosis (CF, gene, and, more particularly to the 
identification, isolation and cloning of the DNA sequence 
corresponding to mutants of the CF gene, as well as their 
transcnpts, gene products and genetic information at 
exon/intron boundaries. The present invention also 
relates to methods of screening for and detection of CF 
carriers, CF diagnosis, prenatal CF screening and 
diagnosis, and gene therapy utilizing recombinant 
technologies and drug therapy using the information 

w Z T iT fr0m , the ° NA ' Pr ° tein ' and the -fbolic action 
15 of the protein. 



BACKGROUND OP ^NVENTTrw 

cystic fibrosis (CF) is the most common severe 
autosomal recessive genetic disorder in the Caucasian 
population, it affects approximately i i„ 20 oo live 
Wrths in North America [B oat et al, The m^ir ^ r 
°t Inherit Pipenr r, 6th ed, pp 2649-2680, McGraw Hill 

Z ( " 89)I * A PP"*i»*tel y i m 20 persons are carriers of 
the disease. 
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Although the disease was first described in the late 
1930's, the basic defect remains unknown. The major 
symptoms of cystic fibrosis include chronic pulmonary 
disease, pancreatic exocrine insufficiency, and elevated 
sweat electrolyte levels. The symptoms are consistent 
with cystic fibrosis being an exocrine disorder. 
Although recent advances have been made in the analysis 
of ion transport across the apical membrane of the 
epithelium of CF patient cells, it is not clear that the 
abnormal regulation of chloride channels represents the 
primary defect in the disease. Given the lack of 
understanding of the molecular mechanism of the disease 
an alternative approach has therefore been taken in an ' 
attempt to understand the nature of the molecular defect 



WO 91/10734 



PCT/CA91/O0O09 



15 



through direct cloning of the responsible gene on the 
basis of its chromosomal location. 

However, there is no clear phenotype that directs an 
approach to the exact nature of the genetic basis of the 
5 disease, or that allows for an identification of the 
cystic fibrosis gene. The nature of the CF defect in 
relation to the population genetics data has not been 
readily apparent. Both the prevalence of the disease and 
the clinical heterogeneity have been explained by several 
10 different mechanisms: high mutation rate, 

hetero Z ygote advantage, genetic drift, multiple loci, and 
reproductive compensation. 

Many of the hypotheses can not be tested due to the 
lack of knowledge of the basic defect. Therefore, 
alternative approaches to the determination and 
characterization of the CP gene have focused on an 
attempt to identify the location of the gene by genetic 
analysis. 

Linkage analysis of the CF gene to antigenic and 
protein markers was attempted in the 1950's, but no 
positive results were obtained [Steinberg et al Am. j. 
Hum, Genef.. ? : 162-176, (1956); Steinberg and Morton Ajl, 
J t ffum r On*t 8: 177-189, (1956); Goodchild et al j. Med 

Genet f, 7: 417-419, 1976. ~ 

25 More recently, it has become possible to use RFLP's 

to facilitate linkage analysis. The first linkage of an 
RFLP marker to the CF gene was disclosed in 1985 [Tsui et 
al. ficlsnss 230: 1054-1057, 1985] in which linkage was 

30 nno»? b6tWeen CF gene and an uncharacterieed marker 
30 DOCRX-917. The association was found in an analysis of 
39 families with affected CF children. This showed that 
although the chromosomal location had not been 
established, the location of the disease gene had been 
narrowed to about 1% of the human genome, or about 30 
35 million nucleotide base pairs. 

The chromosomal location of the DOCRI-917 probe was 
established using rodent-human hybrid cell lines 
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containing different human chromosome complements. It 
was shown that DOCR1-917 (and therefore the CP gene) maps 

to human chromosome 7. 

Further physical and genetic linkage studies were 
5 pursued in an attempt to pinpoint the location of the CF 
gene. Zengerling et al rAm. j. m*. n^, 40: 228 _ 236 

(1987) ] describe the use of human-mouse somatic cell 
hybrids to obtain a more detailed physical relationship 
between the CF gene and the markers known to be linked 

10 with it. This publication shows that the CF gene can be 
assigned to either the distal region of band q22 or the 
proximal region of band q3i on chromosome 7. 

Rommens et al rAm. J. w.nn, a, n?r 43 . 645 _ 663/ 

(1988) ] give a detailed discussion of the isolation of 
15 many new 7q3l probes. The approach outlined led to the 

isolation of two new probes, D7S122 and D7S340, which are 
close to each other. Pulsed field gel electrophoresis 
mapping indicates that these two rplp markers are between 
two markers known to flank the CF gene, MET [White, R. 

20 Woodward S., Leppert M. , et al. ^aJa^e 3 18s 382-384 
(1985)] and D7S8 [Wainwright, B. J., scambler, p. j.' 
and J. Schmidtke, fiaJajre. 318: 384-385 (1985)], therefore 
in the CF gene region. The discovery of these markers 
provides a starting point for chromosome walking and 

25 jumping. 

Estivill et al, [fiatacs 326: 840-845(1987,] disclose 
that a candidate cDNA gene was located and partially 
characterized. This however, does not teach the correct 
location of the CF gene. The reference discloses a 

30 candidate cDNA gene downstream of a CpG island, which are 
undermethylated GC nucleotide-rich regions upstream of 
many vertebrate genes. The chromosomal localization of 
the candidate locus is identified as the XV2C region 
This region is described in European Patent Application 

35 88303645.1. However, that actual region does not include 
the CF gene. 
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A «jor difficulty in ide»tif ying ^ „ 

.« previous ^^^L^LTS^ 
S genes by knowledge of Mp position. 

cvto/T " arran96M,,ts »* deletions could be observed 
cytolcidly ^ is . r . sult , . phy3loal ^J™" 

Knowledge of tbe Secular ^T^TlTZT*' 
particular disease would allow clo„in* . I 

- « g „. „ ro » tlne proce^L "piirr^rr f 

cloned g.„.s. y "* • XI, "»">» Products of the 

did not pinpoint its molecular location thT 
include chro^ jtaping trm the fUBjail ^£ 

: e 0 t lnatt0n «* SO -" lc «" ^brid and Secular e L,in, 
technigues designed to isolate DNA fragnent, t "» ' 

irr 5 ' 1 " 6 ' 1 ^ 1SUndS M " cCoaosoL 

identify the gene responsible for cystic 
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fibrosis where the prior art was uncertain or, even in 
one case, wrong. 

The appUcation of these genetic and molecular 
cloning strategies has allowed the isolation and cDNA 
5 cloning of the cystic fibrosis gene on the basis of its 
chromosomal location, without the benefit of genomic 
rearrangements to point the way. The identification of 
the normal and mutant forms of the CP gene and gene 

10 ^° dUCt ^ haS aU ° Wed ^ devel <*»«* •* screening and 

antlbo^ Ttt CF UtlUZing nUClSiC " id «- 
antibodies to the gene product. Through interaction with 

the defective gene product and the pathway in which this 

gene product is involved, therapy through normal gene 

product supplementation and gene manipulation and 

15 delxvery are now made possible. 

The gene involved in the cystic fibrosis disease 
process, hereinafter the "CP gene" and its functional 
equivalents, has been identified, isolated and cDNA 
cloned, and its transcripts and gene products identified 

20 and sequenced. A three base pair deletion leading to the 
omission of a phenylalanine residue in the gene product 
has been determined to correspond to the mutations of the 
CP gene in approximately 70% of the patients affected 

25 iT+t*' Wit f aUtatl ° nS in «* if not 

25 all the remaining cases. This subject matter is 

sTLTbL" !°: P : ndin9 ^ Patent Nation 

S.N. 396,894 filed August 22, 1989 and its related 

continuation-in-part applications s.n. 399,945 filed 
August 24, 1989 and S.N. 401,609 filed August 31 1989 

30 flPMMMtv o» TrTTTTTfTTT 

According to this invention, other base pair 

^iT 10 ^ 017 alterati ° ns leadi ^ to the omission of amino 
a«d residues in the gene product have been determined. 
According to this invention other nucleotide deletions or 
alterations leading to mutations in the dna sequence 
resulting i„ frameshift or splice mutations have been 
determined. 



35 



WO 91/10734 



PCT/CA91/00009 



15 



25 



30 



35 



With the identification and sequencing of the mutant 
gene and its gene product, nucleic acid probes and 
antibodies raised to the mutant gene product can be used 
in a variety of hybridization and immunological assays to 
5 screen for and detect the presence of either the 

defective CP gene or gene product. Assay kits for such 
screening and diagnosis can also be provided. The 
genetic information derived from the intron/exon 
boundaries is also very useful in various screening and 
10 diagnosis procedures. 

Patient therapy through supplementation with the 
normal gene product, whose production can be amplified 
using genetic and recombinant techniques, or its 
functional equivalent, is now also possible. Correction 
or modification of the defective gene product through 
drug treatment means is now possible, m addition, 
cystic fibrosis can be cured or controlled through'gene 
therapy by correcting the gene defect in si£u. or using 
recombinant or other vehicles to deliver a DNA sequence 
capable of expression of the normal gene product to the 
cells of the patient. 

According to another aspect of the invention, a 
purified mutant CP gene comprises a DMA sequence encoding 
an ammo acid sequence for a protein where the protein 
when expressed in cells of the human body, is associated 
with altered cell function which correlates with the 
genetic disease cystic fibrosis. 

According to another aspect of the invention, a 
purified RNA molecule comprises an RNA sequence 
corresponding to the above DNA sequence. 

According to another aspect of the invention, a DNA 
molecule comprises a cDNA molecule corresponding to the 
above DNA sequence. 

According to another aspect of the invention, a DNA 
molecule comprises a DNA sequence encoding mutant CFTR 
polypeptide having the sequence according to the 
following Figure l for amino acid residue positions i to 
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1480 as further characterized by a nucleotide sequence 
variants resulting in deletion or alteration of amino 
acids or residue positions 85, 148, 178, 455, 493, 507, 
542, 549, 551, 560, 563, 574, 1077 and 1092. 

According to another aspect of the invention, a DNA 
molecule comprises an intronless DNA sequence encoding a 
mutant CFTR polypeptide having the sequence according to 
Figure 1 for DNA sequence positions l to 4575 and, 
further characterised by nucleotide sequence variants 
resulting in deletion or alteration of DNA at DNA 
sequence positions 129, 556, 621+1, 711+1, 1717-1 and 
3659. 

According to another aspect of the invention, a DNA 
molecule comprises a cDNA molecule corresponding to the 
15 above DNA sequence. 

According to another aspect of the invention, the 
cDNA molecule comprises a DNA sequence selected from the 
group consisting of: 

(a) DNA sequences which correspond to the mutant 
DNA sequence selected from the group of mutant amino acid 
positions of 85, 148, 178, 455, 493, 507, 542, 549, 551, 
560, 563, 574, 1077 and 1092 and mutant DNA seqnece 
positions 129, 556, 621+1, 711+1, 1717-1 and 3659 and 
which encode, on expression, for mutant CFTR polypeptide; 
25 (b) DNA sequences which correspond to a fragment of 

the selected mutant DNA sequence, including at least 
twenty nucleotides; 

(c) DNA sequences which comprise at least twenty 
nucleotides and encode a fragment of the selected mutant 

30 CFTR protein amino acid sequence; 

(d) DNA sequences encoding an epitope encoded by at 
least eighteen sequential nucleotides in the selected 
mutant DNA sequence. 

According to another aspect of the invention, a DNA 
35 sequence selected from the group consisting of: 
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(a) DKA sequences which correspond to portions of 
DNA sequences of boundaries of exons/introns of the 
genomic CF gene; 

(b) DNA sequences of at least eighteeh sequential 
nucleotides at boundaries of exons/introns of the genomic 
CF gene depicted in Figure 18; and 

(c) DNA sequences of at least eighteen sequential 
nucleotides of intron portions of the genomic CF gene of 
Figure 18. 

According to another aspect of the invention, a 
purified nucleic acid probe comprises a DNA or RNA 
nucleotide sequence corresponding to the above noted 
selected DNA sequences of groups (a) to (c) . 

According to another aspect of the invention, 
purified RNA molecule comprising RNA sequence corres- 
ponds to the mutant DNA sequence selected from the group 
of mutant protein positions consisting of 85, 148, 178 
455, 493, 507, 542, 549, 551, 560, 563, 574, 107 7 'and ' 
1092 and of mutant DNA sequence positions consisting of 
129, 556, 621+1, 711+1, 1717-1 and 3659. 

A purified nucleic acid probe comprising a DNA or 
RNA nucleotide sequence corresponding to the mutant 
sequences of the above recited group. 

According to another aspect of the invention, a 
recombinant cloning vector comprising the DNA sequences 
of the mutant DNA and fragments thereof selected from the 
group of mutant protein positions consisting of 85, i 48f 
178, 455, 493, 507, 542, 549, 551, 563, 574, 1077 and 
1092 and selected from the group of mutant DNA sequence 
positions consisting of 129, 556, 621+1, 711+1, 1717-1 
and 3659 is provided. The vector, according to an aspect 
of this invention, is operatively linked to an expression 
control sequence in the recombinant DNA molecule so that 
the selected mutant DNA sequences for the mutant CFTR 
35 polypeptide can be expressed. The expression control 
sequence is selected from the group consisting of 
sequences that control the expression of genes of 
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prokaryotic or euJcaryotic cells and their viruses and 
combinations thereof* 

According to another aspect of the invention, a 
method for producing a mutant era polypeptide comprises 
the steps of: 

(a) culturing a host cell transfected with the 
recombinant vector for the mutant DMA sequence in a 
medium and under conditions favorable for expression of 
the mutant CFTR polypeptide selected from the group of 
mutant CFTR polypeptides at mutant protein positions 85 
148, 178, 455, 493, 507, 542, 549, 551, 560, 563, 574, ' 
1077 and 1092 and mutant DNA sequence positions 129, 556, 
621+1, 7ii + i 1717-1 and 3659; and 

(b) isolating the expressed mutant CFTR 
15 polypeptide. 

According to another aspect of the invention, a 
purxfied protein of human cell membrane origin comprises 
an ammo acid sequence encoded by the mutant DNA 
sequences selected from the group of mutant protein 
20 positions of 85, 148, 178, 455, 493, 507, 542, 549, 551 
560, 563, 574, 1077 and 1092 and from the group of mutant 
DMA sequence positions 129, 556, 621+1, 711+1, 1717-1 and 
3659 where the protein, when present in human cell 
membrane, is associated with cell function which causes 
the genetic disease cystic fibrosis. 

According to another aspect of the invention, a 

TT i8 K r° Vide<l f ° r screenin * a to determine 

if the subject is a CF carrier or a CF patient comprising 
the steps of providing a biological sample of the subject 
to be screened and providing an assay for detecting in 
the biological sample, the presence of at least a member 
from the group consisting of: 

(a) mutant CF gene selected from the group of 
mutant protein positions 85, 148, 178, 455, 
493, 507, 542, 549, 551, 560, 563, 574, 1077 
and 1092 and from the group of mutant DNA 
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sequence positions 129 , 556, 621+1, 711+1 , 
1717-1 and 3659; 

(b) mutant CF gene products and mixtures thereof; 

(c) DNA sequences which correspond to portions of 
5 DNA sequences of boundaries of exons/introns of the 

genomic CF gene; 

(d} DNA sequences of at least eighteen sequential 
nucleotides at boundaries of exons/introns of the genomic 
CF gene depicted in Figure 18; and 
10 (e) DNA sequences of at least eighteen sequential 

nucleotides of intron portions of the genomic CF gene of 
Figure 18. 

According to another aspect of the invention, a kit 
for assaying for the presence of a CF gene by immunoassay 
15 techniques comprises: 

(a) an antibody which specifically binds to a gene 
product of the mutant DNA sequence selected from the 
group of mutant protein positions 85, 148, 178, 455, 493, 
507, 542, 549, 551, 560, 563, 574, 1077 and 1092 and from 

20 the group of mutant DNA sequence positions 129, 556, 
621+1, 711+1, 1717-1 and 3659; 

(b) reagent means for detecting the binding of the 
antibody to the gene product; and 

(c) the antibody and reagent means each being 

25 present in amounts effective to perform the immunoassay. 

According to another aspect of the invention, a kit 
for assaying for the presence of a mutant CF gene by 
hybridization technique comprises: 

(a) an oligonucleotide probe which specifically 
30 binds to the mutant CF gene having a mutation at a 

protein position selected from the group consisting of 
85, 148, 178, 455, 493, 507, 542, 549, 551, 560, 563, 
574, 1077 and 1092 or having a mutation at a DNA sequence 
position selected from the group consisting of 129, 556, 
35 621+1, 711+1, 1717-1 and 3659; 

(b) reagent means for detecting the hybridization 
of the oligonucleotide probe to the mutant CF gene; and 
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(c) th. probe and reagent means each being present 
in amounts effective to perform th. hybridization assay! 

anim^T ?' ^ MOther aSp,0t ° f invantion, an 
animal comprise, an heterologous call system. Th. call 
5 system ancludes a recombinant cloning nctor v ^ 

ZZZZjr* 1 -* ~ -respond", to 

l 0 ™, *° aspact of tha invantion, in a 

10 Polymerase chain reaction to amplify a selectea ^ a 
cdna sequence of Figure 1, th. use of oligonucleotide 
primers from intron portions near the 5- and 3- 
boundaries of the saleoted axon of Figure is 

" and th lgUre 1 U nUel « ot " e «9u«ce of tha CF ,.„. 
and tha . Mno acid aa^ence of tha CFTR protein amino 

To^TlT^ 1 ^ — - - - - 

Figure 2 is a restriction map of the CF gene and th. 

" C — — and^ump t 

inoiu:^: xzz ZT'r ot tte ra ' ion 

f)al , , «naing rne CF gene generated by pulsed 

field gen electrophoresis. Panels A B c n 1 

25 hybridization data for the restrict 

t restriction enzymes Sal I xh« 

Stl *• and I- respectively generatafby 
representative genomic and cDNA probes which span the 
region The deduced physical maps for each reason 
enzyme is shown below each panel. A composite man of the 
30 entire HET- D7S « interval is shown in P a„ el E (j" 
Rommens et al.. Am. a. Hum. cenet. 45 !9 32- 9 4l l^o, 
™e open boxed segment indicates th. portion loned by 
chromoaom. walking and lumping, and th. fiUed arrow 
indites the portion covered by the CF transcript. 

Figures 4A, 4B and 4C show the detection of 
conserved nucleotide sequences by cross-species 
hybridization. 
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Figure 4D is a restriction map of overlapping 
segments of probes E4.3 and Hi. 6. 

Figure 5 is an RNA blot hybridization analysis using 
genomic and cDNA probes. Hybridization to RNA of: a- 
fibroblast with cDNA probe G-2; B-trachea (from 
unaf f licted and CF patient individuals) , pancreas, liver, 
HL60 cell line and brain with genomic probe CF16; C-T84 
cell line with cDNA probe 10-1. 

Figure 6 is the methylation status of the E4.3 
cloned region at the 5' end of the CF gene. 

Figure 7 is a restriction map of the CFTR cDNA 
showing alignment of the cDNA to the genomic DNA 
fragments. 

Figure 8 is an RNA gel blot analysis depicting 
hybridization by a portion of the CFTR cDNA (clone lo-l) 
to a 6.5 kb mRNA transcript in various human tissues. 

Figure 9 is a DNA blot hybridization analysis 
depicting hybridization by the CFTR cDNA clones to 
genomic DNA digested with EcoRl and Hind in. 

Figure io is a primer extension experiment 
characterizing the 5' and 3' ends of the CFTR cDNA. 

Figure 11 is a hydropathy profile and shows 
predicted secondary structures of CFTR. 

Figure 12 is a dot matrix analysis of internal 
homologies in the predicted CFTR polypeptide. 

Figure 13 is a schematic model of the predicted CFTR 
protein. 

Figure 14 is a schematic diagram of the restriction 
fragment length polymorphisms (RFLP's) closely linked to 
the CF gene where the inverted triangle indicates the 
location of the F508 3 base pair deletion. 

Figure 15 represents alignment of the most conserved 
segments of the extended NBFs of CFTR with comparable 
regions of other proteins. 

Figure 16 is the DNA sequence around the F508 
deletion. 
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Figure 17 is a representation of the nucleotide 
sequencing gel showing the DNA sequence at the F508 
deletion. 

Figure 18 is the nucleotide sequence of the portions 
5 of introns and complete exons of the genomic CF gene for 
27 exons identified and numbered sequentially as l 
through 24 with additional exons 6a, 6b, 14a, 14b and 
17a, 17b of cDNA sequence of Figure 1; 

Figure 19 shows the results of amplification of 
10 genomic DNA using intron oligonucleotides bounding exon 
10; 

Figure 20 shows the separation by gel 
electrophoresis of the amplified genomic DNA products of 
a CF family; and 

15 Figure 21 is a restriction mapping of cloned intron 

and exon portions of genomic DNA which introns and exons 
are identified in Figure 18. 

PBT&XEPP PBgCBJPTIPW OF THE PREFERRED EM B ODIMENTS 
DEFINITIONS 

20 m order to facilitate review of the various 

embodiments of the invention and an understanding of 
various elements and constituents used in making the 
invention and using same, the following definition of 
terms used in the invention description is as follows: 
25 CF - cystic fibrosis 

CF carrier - a person in apparent health whose 
chromosomes contain a mutant CF gene that may be 
transmitted to that person's offspring. 

CF patient - a person who carries a mutant CF gene 
on each chromosome, such that they exhibit the clinical 
symptoms of cystic fibrosis. 

CF gene - the gene whose mutant forms are associated 
with the disease cystic fibrosis. This definition is 
understood to include the various sequence polymorphisms 
that exist, wherein nucleotide substitutions in the gene 
sequence do not affect, the essential function of the gene 
product. This term primarily relates to an isolated 
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coding sequence, but can also include some or all of the 
flanking regulatory elements and/ or introns. 

Genomic CF gene - the CP gene which includes 
flanking regulatory elements and/or introns at boundaries 
5 of exons of the CF gene. 

CF - PI - cystic fibrosis pancreatic insufficient, 
the major clinical subgroup of cystic fibrosis patients, 
characterized by insufficient pancreatic exocrine 
function. 

10 CF - ps - cystic fibrosis pancreatic sufficient, a 

clinical subgroup of cystic fibrosis patients with 
sufficient pancreatic exocrine function for normal 
digestion of food. 

CFTR - cystic fibrosis transmembrane conductance 

15 regulator protein, encoded by the CF gene. This 

definition includes the protein as isolated from human or 
animal sources, as produced by recombinant organisms, and 
as chemically or enzymatically synthesized. This 
definition is understood to include the various 

20 polymorphic forms of the protein wherein amino acid 
substitutions in the variable regions of the sequence 
does not affect the essential functioning of the protein, 
or its hydropathic profile or secondary or tertiary 
structure. 

25 DHA - standard nomenclature is used to identify the 

bases. 

Intronless DNA - a piece of DNA lacking internal 
non-coding segments, for example, cDNA. 

IRP locus sequence - (protooncogene int-1 related), 
30 a gene located near the CF gene. 

Mutant CFTR - a protein that is highly analagous to 
CFTR in terms of primary, secondary, and tertiary 
structure, but wherein a small number of amino acid 
substitutions and/or deletions and/or insertions result 
35 in impairment of its essential function, so that 

organisms whose epithelial cells express mutant CFTR 
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rather than CFTR demonstrate the symptoms of cystic 
fibrosis. 

nCP - a mouse gene orthologous to the human CP gene 
- nucleotide (ATP) binding folds 
5 ORF - open reading frame 

PCR - polymerase chain reaction 
Protein - standard single letter nomenclature is 
used to identify the amino acids 

10 th 6 r^° main " 3 highly Charged io domain of 

10 the CFTR protein 

RSV - Rous Sarcoma virus 
SAP - surfactant protein 

*TO - restriction fragment length polymorphism 

■0 !° 7 " Ut " >t CPTR Pr ° tein ° r BUtant a™ P«*« in amino 

CFTR protein^'h 0 ^ ^ lu ^* n * CFTR polypeptide — the mutant 
CFTR protein wherein an Mim> aeid deletion 

isoleucine 506 or 507 protein position of the CFTR 

2 « 0 ^"rt! iOT TOaaS aBin ° Position. 

IBOLATTIM Tint ftp^ 

hybridation, raa sequences encompassing > 500 kilobase 

cTir^rr 0 "* 7 containin9 ° yeti ° 

aforementwn co-pending Onited states patent 
applications. For purposes of convenience in 
understanding and isolating the CF gene and identifying 
other mutations, such as at the .5. l 48 . 117 e, 455 49? 
507 542, 549. 560, 563, 574, 1077 and 1092 amino Jil 

residue positions, the technics is reiterated here 
Several transcribed sequences and ,.„„. „ 

been identified in JT^T^T^LT^' **" 

3 one ot these corresponds 
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to the CF gene and spans approximately 250 kb of genomic 
DNA. Overlapping complementary DNA (cDNA) clones have 
been isolated from epithelial cell libraries with a 
genomic DNA segment containing a portion of the cystic 
5 fibrosis gene. The nucleotide sequence of the isolated 
cDNA is shown in Figures 1 through 18. in each row of 
the respective sequences the lower row is a list by 
standard nomenclature of the nucleotide sequence. The 
upper row in each respective row of sequences is standard 

10 single letter nomenclature for the amino acid 
corresponding to the respective codon. 

Accordingly, the isolation of the CF gene provided a 
cDNA molecule comprising a DNA sequence selected from the 
group consisting of: 

15 (a) DNA sequences which correspond to the DNA 

sequence of Figure 1 from amino acid residue position 1 
to position 1480; 

(b) DNA sequences encoding normal CFTR polypeptide 
having the sequence according to Figure 1 for amino acid 

20 residue positions from l to 1480; 

(c) DNA sequences which correspond to a fragment of 
the sequence of Figure 1 including at least 16 sequential 
nucleotides between amino acid residue positions l and 
1480; 

25 (d) DNA sequences which comprise at least 16 

nucleotides and encode a fragment of the amino acid 
sequence of Figure 1; and 

(e) DNA sequences encoding an epitope encoded by at 
least 18 sequential nucleotides in the sequence of Figure 

30 l between amino acid residue positions 1 and 1480. 

According to this invention, the isolation of other 
mutations in the CF gene also provides a cDNA molecule 
comprising a DNA sequence selected from the group 
consisting of: 

35 a) DNA sequences which correspond to the DNA 

sequence encoding mutant CFTR polypeptide characterized 
by cystic f ibrosis-associated activity in human 
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epithelial cells, or the DNA sequence of Figure 1 for the 
amino acid residue positions i to 1480 yet further 
characterized by a base pair mutation which results in 
the deletion of or a change for an amino acid at residue 
5 positions 85, 148, 178, 455, 493, 507, 542, 549, 551, 
560, 563, 574, 1077 and 1092; 

b) DNA sequences which correspond to fragments of 
the mutant portion of the sequence of paragraph a) and 
which include at least sixteen nucleotides; 
10 c) dna sequences which comprise at least sixteen 

nucleotides and encode a fragment of the amino acid 
sequence encoded for by the mutant portion of the DNA 
sequence of paragraph a); and 

d) DNA sequences encoding an epitope encoded by at 
is least is sequential nucleotides in the mutant portion of 
the sequence of the DNA of paragraph a) . 

Transcripts of approximately 6,500 nucleotides in 
size are detectable in tissues affected in patients with 
CF. Based upon the isolated nucleotide sequence, the 
predicted protein consists of two similar regions, each 
containing a first domain having properties consistent 
with membrane association and a second domain believed to 
be involved in ATP binding. 

A 3 bp deletion which results in the omission of a 
Phenylalanine residue at the center of the first 
predicted nucleotide binding domain (amino acid position 
508 of the CF gene product) was detected in CF patients 
This mutation in the normal DNA sequence of Figure l 
corresponds to approximately 70% of the mutations in 
cystic fibrosis patients. Extended haplotype data based 
on DNA markers closely linked to the putative disease 
gene suggest that the remainder of the CF mutant gene 
pool consists of multiple, different mutations. This is 
now exemplified by this invention at, for example, the 
506 or 507 protein position. A small set of these latter 
mutant alleles (approximately 8%) may confer residual 
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pancreatic exocrine function in a subgroup of patients 

who are pancreatic sufficient. 

lii CHROMOSOME WALKING AND JUMPING 

Large amounts of the DNA surrounding the D7S122 and 
D75340 linkage regions of Romnens et al supra were 
searched for candidate gene sequences, in addition to 
conventional chromosome walking methods, chromosome 
jumping techniques were employed to accelerate the search 
process. From each jump endpoint a new bidirectional 
walk could be initiated. Sequential walks halted by 
"unclonable" regions often encountered in the mammalian 
genome could be circumvented by chromosome jumping. 

The chromosome jumping library used has been 
described previously [Collins et al, Science 235, 1046 
(1987); Ianuzzi et al, Am. J. Hum. Cen^ r 44# 6 95 
(1989) ] . The original library was prepared from a 
preparative pulsed field gel, and was intended to contain 
partial EcoRl fragments of 70 - 130 kb; subsequent 
experience with this library indicates that smaller 
fragments were also represented, and jumps izes of 25 - 
110 kb have been found. The library was plated on sup- 
host MC1061 and screened by standard techniques, 
[Maniatis et al]. Positive clones were subcloned into 
pBRA23Ava and the beginning and end of the jump 
identified by EcoRl and Ava 1 digestion, as described in 
Collins, cenone analysis; A practical approach (irl, 
London, 1988) , pp. 73-94) . For each clone, a fragment 
from the end of the jump was checked to confirm its 
location on chromosome 7. The contiguous chromosome 
region covered by chromosome walking and jumping was 
about 250 kb. Direction of the jumps was biased by 
careful choice of probes, as described by Collins et al 
and Ianuzzi et al, supra. The entire region cloned, 
including the sequences isolated with the use of the CF 
gene cDNA, is approximately 500 kb. 

The schematic representation of the chromosome 
walking and jumping strategy is illustrated in Figure 2. 
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CP gene exons are indicated bv „ 

i»4 °Y ■Roman numerals in 

Figure. Horizontal lines abo V « / 

steps whereas the «. w indicate walk 

The FilZ ° Ve the Mp indic *te jump steps 

The Figure proceeds from left to *h„k* • steps. 

» tiers with the direction ofendl to^ceT f SiX 
indicts. The restriction ,ap for^ , " 

spanning th. entire cloned region . 
indicts with ™ rsth . r l n v .^" S t9S 

10 sites which have „»♦ vertical lines indicate 

addition. .is^^rs^jsr^- 

the walking p r og«ss obtained wilL ITj™" indlCatln9 
clones te ,in with the letter = ^1^1 T' COB '" it, 
20 Phage. Cosaid CF26 proved to ^ C nt 

Portion is derived fL a -££^2 

^etr^iont 8 " 1 — ^ 

—ntai h^tr .r;:^: - - 

25 <^n, the «p. riMntE . Three o/^! "^I 0 ^ 
Independent subcloning of . ^ "Present 

- detect po^orph^L^VLTr ; l0 p : ev :: u :r 

<"e7», probe ei corresponds to Kl^sTEstivi , 
30 and probe E4.1 corresponds to Hp 6 ™ ( x B,ti v i' T"' ' 

*»• J - ti-Utt 44, 704 (1,89), a', " 

« * which detects , transcribed ^."nce ^TT"* 

^ une J-m 1 locus sequence tr t • 
transcript on the genomic „ap. location of this 
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As the two independently isolated DMA markers, 
D7S122 (pHl3i) and D7S340 (TM58) , were only 
approximately 10 kb apart (Figure 2) , the walks and jumps 
were essentially initiated from a single point. The 
5 direction of walking and jumping with respect to MET and 
D7S8 was then established with the crossing of several 
rare-cutting restriction endonuclease recognition sites 
(such as those for Xho I, Hru I and Not I, see Figure 2) 
and with reference to the long range physical map of J. 
10 M. Rommens et al. Am. J. Hum, p.*^, , in preS s; a. M. 
Poustka, et al, genpmjcs 2, 337 (1988); M. L. Drumm et 
al. genomics 2, 346 (1988). The pulsed field mapping 
data also revealed that the Not I site identified by the 
inventors of the present invention (see Figure 2, 
15 position 113 kb) corresponded to the one previously found 
associated with the IRP locus (Estivill et al 1987, 
supra) . since subsequent genetic studies showed that CF 
was most likely located between IRP and D7S8 [M. Farrall 
et al » flflt Jt Hum, genet. 43, 471 (1988), B.S. Kerem et 
20 al - ft fffflRr Gfin^t,. 44, 827 (1989)], the walking and 
jumping effort was continued exclusively towards cloning 
of this interval, it is appreciated, however, that other 
coding regions, as identified in Figure 2, for example, 
G-2, CF14 and CF16, were located and extensively 
25 investigated. such extensive investigations of these 
other regions revealed that they were not the CF gene 
based on genetic data and sequence analysis. Given the 
lack of knowledge of the location of the CF gene and its 
characteristics, the extensive and time consuming 
examination of the nearby presumptive coding regions did 
not advance the direction of search for the CF gene. 
However, these investigations were necessary in order to 
rule out the possibility of the CF gene being in those 
regions. 

Three regions in the 280 kb segment were found not 
to be readily recoverable in the amplified genomic 
libraries initially used. These less clonable regions 
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Zll '° C " e<1 ne " «- ° NA -^ts „.« an, x.«, ana 
just beyond cosmid c»44. at position. 75-ioo *b, 20 5- 225 

r.=o^ nant clones near ^ w<rs JT 

oal " ith ."er onj" few 

passages of bacterial culture. To fill f„ tk 

special host-vector system which have been reported L 
allow propagation of unstable sequences fA » » 
10 B. Wolfe d Brf.^.., _ =«uencee [A. R. Hyoan, l. 

o. »o«e, D. Botstam, Proc. - , 

2880 (1985) ; K. P. w«rt mln, A R WvZ n . ' ^ *' 

Although the region near c,s.id cW«4 rela ill t L 
15 recovered, the region near x * 

with these lihrarL? ~ 8UCCe88fu11 * 

GPMaTOTOTTQN op agwr>yrr» TiTTFUflTTTTn 

* —ding to procedure describe in^Iatls Tl 
m*l. Four ph.,. libraries were cloned in ^ 

co^ercially available fro. stratagene, andtTee in 
m (ccerctally available fro, stratagene,, with 
vector a™ provided by the ^nufacturer one C 

dmaT COOStrUCted - ""Partially d geld 

oha fro. a hu.an-ha.ater hybrid containing hLn 

fhronosoBe 7 WAm/mm, tRonnens et J 
^ «. 4 »,..„. and other libraries 

■ total ECORI °^»» 

quences, five of the phage libraries were 
P-pagated on the reco^ination-def icient host" 0^ 
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(recD->, CES 200 (recBC> [Wyman et al, affirm , Wertman et 
al SHEE&, Wyman et al ajfica]; or TAP90 [Patterson et al 
Eucleic Acids F*s , 15:6298 (1987)]. Three cosmid 
libraries were then constructed, in one the victor 
PCV108 [Lau et al Proc. Natl, c f1 m 80j5225 

(1983)] was used to clone partially digested (Sau 3A) DNA 
from 4AF/102/K015 [Rommens et al Am. J. h™. 43 . 4 
(1988)]. a second cosmid library was prepared by cloning 
partially digested (Mbo I) human lymphoblastoid DNA into 
the vector P WE-IL2R, prepared by inserting the RSV (Rous 
Sarcoma Virus) promoter-driven cONA for the interleukin-2 
receptor a-chain (supplied by M. Pordis and B. Howard) in 
place of the neo-resistance gene of pWE15 [Wahl et al 
Proc, Natl, Acad, ffc. j , ttb^ 84:2160 (1987)]. An 
additional partial Mbo I cosmid library was prepared in 
the vector P WE-IL2-Sal, created by inserting a Sal I 
linker into the Bam HI cloning site of pWE-EL2R ( M 
Drumm, unpublished data) ; this allows the use of the 
partial f ili-i„ technique to ligate Sal I and Mbo I ends 
preventing tandem insertions [Zabarovsky et al Seng 42-19 
(1986) ]. cosmid libraries were propagated in ^ sail 
host strains DHl or 490A [M. Steinmetz, A. winoto, K. 
Minard, L. Hood, Ssll 28, 489(1982)]. 
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TABLE 1 
GENOMIC J f jn ? ^ m 

Source of hnman DM Host 



' T HaeII ^L a ^ ^92 



Compley^yP^^ 



digested total human 
liver ONA 



1 x lo 6 
(amplified) 



10 



15 



20 



25 



30 



PCV108 

Adash 
Xdash 

Adash 

Adash 

AFIX 
AFIX 
AFIX 



SaU3a ;S? r 5 ially di **ted DK1 3 x 10« 

from 4AF/K015 (ampUfLd, 

SaU3A nS ar £ ially di 9eBted LE392 i x 10 « 
from 4AF/K015 (amplIfLd? 



35 



40 



45 



50 



Sau3A-partially digested DB1316 
total human peripheral 
blood DNA 

BamHI-digested total DB1316 
human peripheral blood 
ONA 

EcoRI-partially digested DB1316 
total human peripheral 
blood DNA 

Mbol-partially digested LE392 
human lymphoblastoid DNA 

Mbol-partially digested CE200 
human lymphoblastoid DNA 

Mbol-partially digested TAP90 
human lymphoblastoid DNA 

PWE-IL2R Mbol-partially digested 490A 
human lymphoblastoid DNA 

dumping) human lymphoblastoid dna 
lannuzzi 



1.5 x io« 

1.5 x 10* 
8 X 10 6 

1.5 X 10* 

1.2 x 10 e 

1.3 x 10 6 
5 x 10* 
1.2 x 10 6 
3 X 10 6 



Lawn 
et al 
1980 



supra 

et al 
supra 
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Three of the phage libraries were propagated and 
amplified in JLs. poll bacterial strain LE392. Four 
subsequent libraries were plated on the recombination- 
deficient hosts DB1316 (recD> or CES200 (rec BC> [Wyman 
5 1985, Supra ; Wertman 1986, supra; and Wyman 1986, supra ] 
or in one case TAP90 [T.A. Patterson and H. Dean, Nucleic 
Acids Research 15, 6298 (1987)]. 

Single copy DNA segments (free of repetitive 
elements) near the ends of each phage or cosmid insert 
10 were purified and used as probes for library screening to 
isolate overlapping DNA fragments by standard procedures. 
(Haniatis, et al, supra 1 . 

1-2 x 10 6 phage clones were plated on 25-30 150 mm 
petri dishes with the appropriate indicator bacterial 
15 host and incubated at 37 »C for 10-16 hr. Duplicate 

"lifts" were prepared for each plate with nitrocellulose 
or nylon membranes, prehybridized and hybridized under 
conditions described [Rommens et al, 1988, supra ) . 
Probes were labelled with »P to a specific activity of >5 
20 x 10 8 cpm/Mg using the random priming procedure [A. P. 
Feinberg and B. Vogelstein, Anal. Biortmm. i 32 , 6 
(1983)]. The cosmid library was spread on ampicillin- 
containing plates and screened in a similar manner. 

DNA probes which gave high background signals could 
25 often be used more successfully by preannealing the 

boiled probe with 250 pq/ml sheared denatured placental 
DNA for 60 minutes prior to adding the probe to the 
hybridization bag. 

For each walk step, the identity of the cloned DNA 
30 fragment was determined by hybridization with a somatic 
cell hybrid panel to confirm its chromosomal location, 
and by restriction mapping and Southern blot analysis to 
confirm its colinearity with the genome. 

The total combined cloned region of the genomic DNA 
35 sequences isolated and the overlapping cDNA clones, 
extended >500 kb. To ensure that the DNA segments 
isolated by the chromosome walking and jumping procedures 
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were colinear with the genomic sequence, each segment was 
examined by: * as 

hvhrid a ' ^ idiMtion ^"l. with hu»en-rodent somatic 
hybrid cell i lnM to confirm chromosome 7 localization 
(b) pulsed field gel electrophoresis, and 

DKA tolTTT?r ° f reattlotim o* the cloned 
to that of the genomic DHA. 

Accordingly, single copv hUMn Dm sequenoM 
isolated from each recombinant phage ana coemid clone and 

^erto™.m °' theSe analyses as 

performed by the procedure of Haniatis, et al supra 

While the majority of phage and cosmia isolates 
represented correct walk and jump clones, a few resulted 
from clonxn, artifacts or cross-hybridizing sequences 
from other regions in the human genome, or from the 
hamster genome in cases where the libraries were derived 
from a human-hamster hybrid cell line, confirmation of 
correct localisation was particularly important for 
clone, xsolated by chromosome Jumping. Manv ^ clones 
were considered and resulted in non-conclusive 

ZZZZZT* di " Ctlen ° £ -y 
1*2 eg 




30 



^e over appin, cio MS was obtained by long r »ge 
restriction mapping analysis with the use of pulsed field 
gel electrophoresis (J . „. R omm ens , et al . J 

^ in press, A. „. Pcust.a et al, 1 9 88^TL 
Drumm et al, 1988 supra \ . 

Figure. 3A to 3E illustrates the findings of the 
long r .„g e restriction mapping study, where a schematic 

trom the human-hamster cell ii ne 4AP/102/K015 was 
digested vith the enzymes (A, Sal 1, (B , jtho 1, (c) S fi I 
35 end <D> H.e 1, separated by pulsed field gel 

electrophoresis, and transferred to Zetaprobe- (MoRad, . 
For each enzyme a single blot was sequentially hybridized 
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with the probes indicated below each of the panels of 
Figure A to D, with stripping of the blot between 
hybridizations. The symbols for each enzyme of Figure 3E 
are: A, Nae I; B, Bss HII; F. Sfi I; L, Sal I; M, Mlu I; 
N, Not I; R, Nru I; and X, Xho 1. c corresponds to the 
compression zone region of the gel. ONA preparations, 
restriction digestion, and crossed field gel 
electrophoresis methods have been described (Rommens et 
al f in press, shecs) . The gels in Figure 3 were run in 
0.5X TBE at 7 volts/cm for 20 hours with switching 
linearly ramped from 10-40 seconds for (A) , (BJ , and (C) , 
and at 8 volts/cm for 20 hours with switching ramped 
linearly from 50-150 seconds for (D) . schematic 
interpretations of the hybridization pattern are given 
15 below each panel. Fragment lengths are in kilobases and 
were sized by comparison to oligomerized bacteriophage 
ADNA and Saccharomycfts cereviEHao chromosomes. 

H4.0, J44, E61.4 are genomic probes generated from 
the walking and jumping experiments (see Figure 2). j 30 
has been isolated by four consecutive jumps from D7S8 
(Collins et al, 1987, gujaa; ianuzzi et al, 1989, surra , 
M. Dean, et al, submitted for publication). 10-1, B.75, 
and CE1. 5/1.0 are cDNA probes which cover different 
regions of the CF transcript: io-l contains exons I - 
VI, B.75 contains exons V - XII, and CEl.5/1.0 contains 
exons xii - XXIV. Shown in Figure 3E is a composite map 
of the entire MET - D7S8 interval. The open boxed region 
indxcates the segment cloned by walking and jumping, and 
the closed arrow portion indicates the region covered by 
30 the CF transcript. The CpG-rich region associated with 
the D7S23 locus (Estivill et al, 1987, supra) is at the 
Not I site shown in parentheses. This and other sites 
shown in parentheses or square brackets do not cut in 
4AF/102/K015, but have been observed in human lymphoblast 
35 cell lines. 
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**A IPMITIFICTTTOW n» g. 

Based on th. findings of long rang, restriction 
■appin, detailed .hove it was deterged that the .Lire 
^ « contained on a 380 kb Sal I f ragBent . 

field"!' ° f T ' " StriCtiM Sit " pulsed 
, el analysis to those identified i„ the partiaUv 

llT*T g SenOBle DN4 el0ne8 that thTsit of 

the CF gene was approxiaately 250 kb 

10 to all™ T lnf0r " atiVe rMtriCtio " •■»»- that served 
to align the M p „ f the cloned dha tracts and the lol 
ran,, restriction »ap was xho I; all of the 9 JL 7 M 
identified with the recombinant DHA clon.T.^ ^ 
eusceptihl. to at least partial cleavage in £££ « 
(compare «ap. i„ Figures i .„d 2) . Furthermore, 
" hy f "" ition «W with probes derived f ro» the 3- 
end of the CF gene identified 2 sfil sites .„d „ 
i-h*» *i«ei44.j - 11 sites and confirmed 

the positxon of an anticipated Nae I site 

These findings further supported the conclusion thai, 
the DNA segments isolated by the chromosome wlmnTand 
20 ^ping procedures were colinear with the "e"i" e 
sequence. 15 uine 




A positive result based on one or more of the 
following criteria suggested that a cloned DNA seL»* 
25 may contain candidate gene sequences: ^ 
(a) detection of cross-hybridizing sequences in 
other species (as many genes show evolutionary 
conservation) , y 

30 th. J"' f ent " io " tlon «* <*= Elands, which often nark 
the y e»d of vertebrate genes t A. P. Bird, fi^ ^ 

*09U s«> ; H. Gardiner-Sarden and M. Fro^eT^ 

filoJU 196, 261 (1987)], ' ~ 

(o) examination of possible mRNA transcripts in 
tissues affected in CP patients, 
35 (d) isolation of corresponding cDNA sequences, 

(e identification of open reading frames by direct 
sequencing of cloned DNA segments. 
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Cross-species hybridization showed strong sequence 
conservation between human and bovine DNA when CF14, E4.3 
and Hi. 6 were used as probes, the results of which are 
shown in Figures 4A, 4B and 4C. 
5 Human, bovine, mouse, hamster, and chicken genomic 

DNAs were digested with Eco Rl (R) , Hind III (H) , and Pst 
I (P), electrophoresed, and blotted to Zetabind™ 
(BioRad) . The hybridization procedures of Rommens et al, 
1988, supra, were used with the most stringent wash at 
10 55 -c, 0.2X ssc, and 0.1% SDS. The probes used for 

hybridization, in Figure 4, included: (A) entire cosmid 
CF14, (B) E4.3, (C) HI. 6. In the schematic of Figure 
(D) , the shaded region indicates the area of cross- 
species conservation. 
15 The fact that different subsets of bands were 

detected in bovine DNA with these two overlapping OKA 
segments (HI. 6 and E4. 3) suggested that the conserved 
sequences were located at the boundaries of the 
overlapped region (Figure 4(D)). when these DNA segments 
20 were used to detect RNA transcripts from a variety of 
tissues, no hybridization signal was detected. In an 
attempt to understand the cross-hybridizing region and to 
identify possible open reading frames, the DNA sequences 
of the entire HI. 6 and part of the E4.3 fragment were 
25 determined. The results showed that, except for a long 
stretch of CG-rich sequence containing the recognition 
sites for two restriction enzymes (Bss HII and Sac II) , 
often found associated with undermethylated CpG islands, 
there were only short open reading frames which could not 
easily explain the strong cross-species hybridization 
signals. 

To examine the methylation status of this highly 
CpG-rich region revealed by sequencing, genomic DNA 
samples prepared from fibroblasts and lymphoblasts were 
digested with the restriction enzymes Hpa II and Msp I 
and analyzed by gel blot hybridization. The enzyme Hpa 
II cuts the DNA sequence 5'-ccgg-3' only when the second 
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cytosine is unmethylated, whereas Msp I cuts this 
sequence regardless of the state of methylation. Small 
DNA fragments were generated by both enzymes, indicating 
that this cpc-rich region is indeed undermethylated in 
5 genomic DNA. The gel-blot hybridization with the E4.3 
segment (Figure 6) reveals very small hybridizing 
fragments with both enzymes, indicating the presence of a 
hypomethylated CpG island. 

The above results strongly suggest the presence of a 
10 coding region at this locus. Two DNA segments (E4.3 and 
Hl.6) which detected cross-species hybridization signals 
from this area were used as probes to screen cDNA 
libraries made from several tissues and cell types. 

CDNA libraries from cultured epithelial cells were 
15 prepared as follows. Sweat gland cells derived from a 
non-CF individual and from a CF patient were grown to 
first passage as described [G. Collie et al, in vit™ 
cell, Pay. Piol . 21, 592,1985]. The presence of 
outwardly rectifying channels was confirmed in these 
20 cells (J.a. Tabcharani, T.J. Jensen, j.r. Riordan, j.w. 
Hanrahan, J, Memb, pfo1 . , in presfi) but ^ cp ^ 

insensitive to activation by cyclic AMP (T.J. Jensen, 
J.W. Hanrahan, J.a. Tabcharani, M. Buchwald and J.R. 
Ri ° rdan ' p g d l a *rlc PVlmonpl9TV, Supplement 2, 100, 1988) 

15 RNA was isolated from them by the method of j.m. Chirgwin 
et al ( Biochemistry is, 5294, 1979). Poly a+rna was 
selected (H. Aviv and P. Leder, Proc. Natl, M 
USA 69, 1408, 1972) and used as template for the 
synthesis of cDNA with oligo (dT) 12-18 as a primer. The 

0 second strand was synthesized according to Gubler and 
Hoffman (fiSD e. 25, 263, 1983). This was methylated with 
Eco Ri methylase and ends were made flush with T4 DNA 
polymerase. Phosphorylated Eco Ri linkers were ligated 
to the cDNA and restricted with Eco RI. Removal of 

5 excess linkers and partial size fractionation was 

achieved by Biogel A-50 chromatography. The cDNAs were 
then ligated into the Eco ri site of the commercially 
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available lamdba ZAP. Recombinant were packaged and 
propagated in E r cojj BB4 . Portions of the packaging 
nixes were amplified and the remainder retained for 
screening prior to amplification. The same procedures 
were used to construct a library from RNA isolated from 
preconfluent cultures of the T-84 colonic carcinoma cell 
line (Dharmsathaphorn, K. et al. Am. J. Phv« ft1 2 46, 
G204, 1984). The numbers of independent recombinant in 
the three libraries were: 2 x 10« for the non-CP sweat 
gland cells, 4.5 x 10 6 for the CP sweat gland cells and 
3.2 x 10 6 from T-84 cells. These phages were plated at 
50,000 per 15 cm plate and plaque lifts made using nylon 
membranes (Biodyne) and probed with DNA fragments 
labelled with «P using DNA polymerase I and a random 
15 mixture of oligonucleotides as primer. Hybridization 
conditions were according to G.M. Wahl and S.L. Berger 
( Meth. gnzymol . 152,415, 1987). Bluescript™ plasmids 
were rescued from plaque purified clones by excision with 
M13 helper phage. The lung and pancreas libraries were 
20 purchased from Clontech Lab Inc. with reported sizes of 
1.4 x 10 s and 1.7 x 10 s independent clones. 

After screening 7 different libraries each 
containing 1 x 10 s - 5 x 10 4 independent clones, l single 
clone (identified as lo-l) was isolated with Hi. 6 from a 
25 cDNA library made from the cultured sweat gland 

epithelial cells of an unaffected (non-CF) individual. 

DNA sequencing analysis showed that probe 10-1 
contained an insert of 920 bp in size and one potential, 
long open reading frame (ORF) . since one end of the 
30 sequence shared perfect sequence identity with HI. 6, it 
was concluded that the cDNA clone was probably derived 
from this region. The DNA sequence in common was, 
however, only 113 bp long (see Figures l and 7). As 
detailed below, this sequence in fact corresponded to the 
35 5'-most exon of the putative CF gene. The short sequence 
overlap thus explained the weak hybridization signals in 
library screening and inability to detect transcripts in 
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RNA gel-blot analysis. In addition, the orientation of 
the transcription unit was tentatively established on the 
basis of alignment of the genomic DNA sequence with the 
presumptive ORF of 10-1. 
5 Since the corresponding transcript was estimated to 

be approximately 6500 nucleotides in length by RNA gel- 
blot hybridization experiments, further cDNA library 
screening was required in order to clone the remainder of 
the coding region. As a result of several successive 

10 screenings with cDNA libraries generated from the colonic 
carcinoma cell line T84, normal and CP sweat gland cells, 
pancreas and adult lungs, 18 additional clones were 
isolated (Figure 7, as subsequently discussed in greater 
detail) . DNA sequence analysis revealed that none of 

15 these cDNA clones corresponded to the length of the 
observed transcript, but it was possible to derive a 
consensus sequence based on overlapping regions. 
Additional cDNA clones corresponding to the 5' and 3' 
ends of the transcript were derived from 5' and 3' 

20 primer-extension experiments. Together, these clones 
span a total of about 6.1 kb and contain an ORF capable 
of encoding a polypeptide of 1480 amino acid residues 
(Figure 1). 

It was unusual to observe that most of the cDNA 
25 clones isolated here contained sequence insertions at 
various locations of the restriction map of Figure 7. 
The map details the genomic structure of the CF gene. 
Exon/intron boundaries are given where all cDNA clones 
isolated are schematically represented on the upper half 
30 of the figure. Many of these extra sequences clearly 
corresponded to intron regions reversely transcribed 
during the construction of the cDNA, as revealed upon 
alignment with genomic DNA sequences. 

since the number of recombinant cDNA clones for the 
CF gene detected in the library screening was much less 
than would have been expected from the abundance of 
transcript estimated from RNA hybridization experiments, 
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it seemed probable that the clones that contained 
aberrant structures were preferentially retained while 
the proper clones were lost during propagation, 
consistent with this interpretation, poor growth was 
5 observed for the majority of the recombinant clones 
isolated in this study, regardless of the vector used. 

The procedures used to obtain the 5' and 3' ends of 
the cDNA were similar to those described (M. Frohman et 
al » PCPCt Pflt i Acad, gcj , USA, 85, 8998-9002, 1988). For 
10 the 5' end clones, total pancreas and T84 poly A + RNA 
samples were reverse transcribed using a primer, (10b) , 
which is specific to exon 2 similarly as has been 
described for the primer extension reaction except that 
radioactive tracer was included in the reaction. The 
15 fractions collected from an agarose bead column of the 
first strand synthesis were assayed by polymerase chain 
reaction (PCR) of eluted fractions. The oligonucleotides 
used were within the 10-1 sequence (145 nucleotides 
apart) just 5' of the extension primer. The earliest 
20 fractions yielding PCR product were pooled and 

concentrated by evaporation and subsequently tailed with 
terminal deoxynucleotidyl transferase (BRL Labs.) and 
dATP as recommended by the supplier (BRL Labs) . a second 
strand synthesis was then carried out with Tag Polymerase 
25 (Cetus, AmpliTaef) using an oligonucleotide containing a 
tailed linker sequence 5'CGGAATTCTCGAGATC(T) 12 3'. 

Amplification by an anchored (PCR) experiment using 
the linker, sequence and a primer just internal to the 
extension primer which possessed the Eco Ri restriction 
30 site at its 5' end was then carried out. Following 
restriction with the enzymes Eco Ri and Bgl 11 and 
agarose gel purification size selected products were 
cloned into the plasmid Bluescript KS available from 
Stratagene by standard procedures (Maniatis et al, 
35 supra) . Essentially all of the recovered clones 

contained inserts of less than 350 nucleotides. To 
obtain the 3' end clones, first strand cDNA was prepared 
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with reverse transcription of 2 xg T84 poly A + rna using 
the tailed linker oligonucleotide previously described 
with conditions similar to those of the primer extension. 
Amplification by PCR was then carried out with the linker 
oligonucleotide and three different oligonucleotides 
corresponding to known sequences of clone T16-4.5. A 
preparative scale reaction (2 x 100 ul) was carried out 
with one of these oligonucleotides with the sequence 
5 ' ATGAAGTCCAAGGATTTAG3 ' . 

This oligonucleotide is approximately 70 nucleotides 
upstream of a Hind III site within the known sequence of 
T16-4.5. Restriction of the PCR product with Hind III 
and Xho 1 was followed by agarose gel purification to 
size select a band at 1.0-1.4 kb. This product was then 
cloned into the plasmid Bluescript KS available from 
Stratagene. Approximately 20% of the obtained clones 
hybridized to the 3' end portion of T16-4.5. 10/10 of 
plasmids isolated from these clones had identical 
restriction maps with insert sizes of approx. 1.2 kb. 
All of the PCR reactions were carried out for 30 cycles 
in buffer suggested by an enzyme supplier. 

An extension primer positioned 157 nt from the 5 'end 
of 10-1 clone was used to identify the start point of the 
putative CF transcript. The primer was end labelled with 
25 7 [*P]ATP at 5000 Curies/mole and T4 polynucleotide kinase 
and purified by spun column gel filtration. The 
radiolabeled primer was then annealed with 4-5 ug poly a 
+ RNA prepared from T-84 colonic carcinoma cells in 2X 
reverse transcriptase buffer for 2 hrs. at 60*C. 
Following dilution and addition of AMV reverse 
transcriptase (Life Sciences, Inc.) incubation at 41*C 
proceeded for 1 hour. The sample was then adjusted to 
0.4M NaOH and 20 mH EDTA, and finally neutralized, with 
NH^c-Ac, pH 4.6, phenol extracted, ethanol precipitated, 
redissolved in buffer with formamide, and analyzed on a 
polyacrylamide sequencing gel . Details of these methods 
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have been described (Meth. EnzymoJL. 152, 1987, Ed. s.L. 
Berger, A.R. Kimmel, Academic Press, N.Y.). 

Results of the prker extension experiment using an 
extension oligonucleotide primer starting 157 nucleotides 
from the 5' end of 10-1 is shown in Panel A of Figure 10 
End labelled 0X174 bacteriophage digested with Hae III 
(BRL Labs) is used as size marker. Two major products 
are observed at 216 and 100 nucleotides. The sequence 
corresponding to loo nucleotides in 10-1 corresponds to a 
very GC rich sequence (ii/ 12 ) suggesting that this could 
be a reverse transcriptase pause site. The 5' anchored 
PCR results are shown in panel B of Figure 10. The 1 4% 
agarose gel shown on the left was blotted and transferred 
to zetaprobe- membrane (Bio-Rad Lab) . DNA gel blot 
hybridization with radiolabeled 10-1 is shown on the 
right. The 5' extension products are seen to vary in 
size from 170-280 nt with the major product at about 200 
nucleotides. The PCR control lane shows a fragment of 
145 nucleotides, it was obtained by using the test 
oligomers within the 10-1 sequence. The size markers 
shown correspond to sizes of 154, 220/210, 298, 344 394 
nucleotides (Ikb ladder purchased from BRL Lab) . ' 

The schematic shown below Panel B of Figure 10 
outlines the procedure to obtain double stranded cDNA 
used for the amplification and cloning to generate the 
clones PA3-5 and TB2-7 shown in Figure 7. The anchored 
PCR experiments to characterize the 3'end are shown in 
panel c. As depicted in the schematic below Figure 10c 
three primers whose relative position to each other were 
known were used for amplification with reversed 
transcribed T84 RNA as described. These products were 
separated on a 1% agarose gel and blotted onto nylon 
membrane as described above. DNA-blot hybridization with 
the 3' portion of the T16-4.5 clone yielded bands of 
sizes that corresponded to the distance between the 
specific oligomer used and the 3'end of the transcript. 
These bands in lanes l, 2 a and 3 are shown schematically 
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below Panel c in Figure 10 . The band in lana , is ^ 
as only 60 nucleotides o, this segment overlaps with the 
probe used. Also indicated in the sche^tic and as shown 

. ^"V" 18 ^ PrDdUCt 9ane " ted * restriction ™ 

• the anchors pcr product to facilitate cloning to 
generate the THZ-4 clone shown in Figure 7 

DHA-blot hybridization analysis of genomic DN A 
-Ugested with EcoKI a„a Bind HI me probed ™ 
portions of cDHAs spanning the entire transcript suggest 
that the gene contains at least 26 exons numbered a! 
Roman numerals t through XXVI (see Figure 9) . Tnese 

correspond to the numbers 1 through « ... . 

™. . tnrougn 26 shown in Figure 7 

The size of each band is gi ve „ in n,. 7 " 

Doait In M9 " re 7 ' ope " ««— ideate approximate 
positions of the 24 exons which have been identified by 
the isolation o, >22 clones from the screening of cDNA 
libraries and from anchored PCR experts designed to 

81 genomic fragments detected by each exon is also 
indicated. The hatched boxes in Figure 7 indicate the 

ST. l lntr ° n SaqUen=M - ^ boxes'" 6 
indicate other seguences. Depicted in the lower left by 
the closed box is the relative position of the clone HI 6 
used to detect the first cD»A clone 10 - 1 from J™ 10' 

Figures 4(0) and 7. the genomic clone Hi.6 partially 
overlaps with an EcoRI fragment of 4.3 Kb. All of L 
CDMA clones shown were hybridized to genomic DMA and/or 
were fine restriction mapped. Examples of the 
restriction sites occurring within the cDMAs and in the 
corresponding genomic fragments .re indicated. 

analv!r, re f erenCe *° TigU ™ *' "» "'"ridization 
analysis indudes probes; i.e., cDNA clones 10-! for 

panel A, T!6-l portion) for panel B, Tl6 -4.5 (central 
portion, for panel c and T 1S -4. 5 „. end portion, 'Z 

ZZL «. panel A of Fi9Ure 9 - 0,6 cDNi p"** 

detects the genomic bands for exons 1 through VI. The 3- 
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portion of T16-1 generated by Nrul restriction detects 
exons IV through XIII as shown in Panel B. This probe 
partially overlaps with 10-1. Panels c and D, 
respectively, show genomic bands detected by the central 
and 3' end EcoRI fragments of the clone T16-4.5. Two 
EcoRI sites occur within the cDNA sequence and split 
exons XIII and XIX. As indicated by the exons in 
parentheses, two genomic EcoRI bands correspond to each 
of these exons. Cross hybridization to other genomic 
fragments was observed. These bands, indicated by N, are 
not of chromosome 7 origin as they did not appear in 
human-hamster hybrids containing human chromosome 7. The 
faint band in panel D indicated by XI in brackets is 
believed to be caused by the cross-hybridization of 
sequences due to internal homology with the cDNA. 

Since 10-1 detected a strong band on gel blot 
hybridization of RNA from the T-84 colonic carcinoma cell 
line, this cDNA was used to screen the library 
constructed from that source. Fifteen positives were 
obtained from which clones T6, T6/20, Til, T16-1 and T13- 
1 were purified and sequenced. Rescreening of the same 
library with a 0.75 kb Bam HI-Eco Ri fragment from the 3' 
end of T16-1 yielded T16-4.5. A 1.8kb EcoRI fragment 
from the 3' end of T16-4.5 yielded T8-B3 and T12a, the 
latter of which contained a polyadenylation signal and 
tail. Simultaneously a human lung cDNA library was 
screened; many clones were isolated including those shown 
here with the prefix *CDL'. A pancreas library was also 
screened, yielding clone CDPJ5. 

To obtain copies of this transcript from a cf 
patient, a cDNA library from RNA of sweat gland 
epithelial cells from a patient was screened with the 
0.75 kb Bam HI - Eco Ri fragment from the 3' end of T16-1 
and clones C16-1 and Cl-1/5, which covered all but exon 
35 i, were isolated. These two clones both exhibit a 3 bp 
deletion in exon 10 which is not present in any other 
clone containing that exon. Several clones, including 
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CDLS26-1 from the lung library and T6/20 and T13-1 
isolated from T84 were derived from partially processed 
transcripts. This was confirmed by genomic hybridization 
and by sequencing across the exon-intron boundaries for 
5 each clone. Til also contained additional sequence at 
each end. T16-4.5 contained a small insertion near the 
boundary between exons 10 and 11 that did not correspond 
to intron sequence, clones CDLS16A, lla and 13a from the 
lung library also contained extraneous sequences of 
10 unknown origin. The clone CI6-1 also contained a short 
insertion corresponding to a portion of the -,-transposon 
of £. soli; this element was not detected in the other 
clones. The 5' clones PA3-5, generated from pancreas RNA 
and TB2-7 generated from T84 RNA using the anchored PGR 
15 technique have identical sequences except for a single 
nucleotide difference in length at the 5' end as shown in 
Figure 1. The 3' clone, THZ-4 obtained from T84 RNA 
contains the 3' sequence of the transcript in concordance 
with the genomic sequence of this region. 
20 a combined sequence representing the presumptive 

coding region of the CF gene was generated from 
overlapping cDNA clones, since most of the cONA clones 
were apparently derived from unprocessed transcripts, 
further studies were performed to ensure the authenticity 
25 of the combined sequence. Each cONA clone was first 

tested for localization to chromosome 7 by hybridization 
analysis with a human-hamster somatic cell hybrid 
containing a single human chromosome 7 and by pulsed 
field gel electrophoresis. Fine restriction enzyme 
30 mapping was also performed for each clone. While 

overlapping regions were clearly identifiable for most of 
the clones, many contained regions of unique restriction 
patterns. 

To further characterize these cDNA clones, they were 
35 used as probes in gel hybridization experiments with 

EcoRl-or Hindlll-digested human genomic DNA. As shown in 
Figure 9, five to six different restriction fragments 
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could be detected with the 10-1 cDNA and a' similar number 
of fragments with other cDNA clones, suggesting the 
presence of multiple exons for the putative CF gene. The 
hybridization studies also identified those cDNA clones 
with unprocessed intron sequences as they showed 
preferential hybridization to a subset of genomic DNA 
fragments. For the confirmed cDNA clones, their 
corresponding genomic DNA segments were isolated and the 
exons and exon/intron boundaries sequenced. As indicated 
in Figure 7, at least 27 exons have been identified which 
includes split exons 6a, 6b, 14a, 14b and 17a, 17b. 
Based on this information and the results of physical 
mapping experiments, the gene locus was estimated to span 
250 kb on chromosome 7. 

15 gj± THE BEOPEKOE 

Figure l shows the nucleotide sequence of the cloned 
cDNA encoding CFTR together with the deduced amino acid 
sequence. The first base position corresponds to the 
first nucleotide in the 5' extension clone PA3-5 which is 
20 one nucleotide longer than TB2-7. Arrows indicate 
position of transcription initiation site by primer 
extension analysis. Nucleotide 6129 is followed by a 
poly(dA) tract. Positions of exon junctions are 
indicated by vertical lines. Potential membrane-spanning 
25 segments were ascertained using the algorithm of 

Eisenberg et al ? t flol, Bjp) t , 179:125 (1984). Potential 
membrane-spanning segments as analyzed and shown in 
Figure 11 are enclosed in boxes of Figure l. m Figure 
11, the mean hydropathy index [Kyte and Doolittle, J^. 
Wolec. Biol . 157: 105, (1982)] of 9 residue peptides is 
plotted against the amino acid number. The corresponding 
positions of features of secondary structure predicted 
according to Garnier et al, rj. Moiee. m„] 157 , 165 
(1982)] are indicated in the lower panel. Amino acids 
comprising putative ATP-binding folds are underlined in 
Figure l. Possible sites of phosphorylation by protein 
kinases A (PKA) or c (PRC) are indicated by open and 
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closed circles , respectively. The open triangle is over 
the 3bp (CTT) which are deleted in CP (see discussion 
below) . The cDNA clones in Figure l were sequenced by 
the dideoxy chain termination method employing 35 S 
5 labelled nucleotides by the Dupont Genesis 2000™ 
automatic DNA sequencer. 

The combined cDNA sequence spans 6129 base pairs 
excluding the poly (A) tail at the end of the 3' 
untranslated region and it contains an ORF capable of 

10 encoding a polypeptide of 1480 amino acids (Figure l) . 
An ATG (AUG) triplet is present at the beginning of this 
ORF (base position 133-135) . Since the nucleotide 
sequence surrounding this codon ( 5 9 -AGAC CAUG CA-3 / ) has 
the proposed features of the consensus sequence (CC) 

15 A/GCC&U£G(G) of an eukaryotic translation initiation site 
with a highly conserved A at the -3 position, it is 
highly probable that this AUG • corresponds to the first 
methionine codon for the putative polypeptide. 

To obtain the sequence corresponding to the 5' end 

20 of the transcript, a primer-extension experiment was 

performed, as described earlier. As shown in Figure 10A, 
a primer extension product of approximately 216 
nucleotides could be observed suggesting that the 5' end 
of the transcript initiated approximately 60 nucleotides 

25 upstream of the end of cDNA clone 10-1. A modified 

polymerase chain reaction (anchored PCR) was then used to 
facilitate cloning of the 5 '-end sequences (Figure 10b). 
Two independent 5 '-extension clones, one from pancreas 
and the other from T84 RNA, were characterized by DNA 

30 sequencing and were found to differ by only 1 base in 

length, indicating the most probable initiation site for 
the transcript as shown in Figure l. 

Since most of the initial cDNA clones did not 
contain a polyA tail indicative of the end of a mRNA, 

35 anchored PCR was also applied to the 3' end of the 
transcript (Frohman et al, 1988, supra \ . Three 3'- 
extension oligonucleotides were made to the terminal 
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portion of the cDNA clone T16-4. 5. As shown in Figure 
10=. 3 PCR products of different =i 2es were obtained 
All were consist with the interpretation that the end 

5 the «»«r"r r ^""^ " - *~ea B of 
the HindlH site at nucleotide position 5027 (see Figure 
l> s The DNA seguence derived from representative cl!nes 

^h a greenent with that of the T84 cONA clone Ti2a 
(see Fxgur. i and 7) and the sequence of the 
corresponding 2.3 to EcoRI genomic fragment. 

3*A SITBs ni> m^p„ enTTTT| 

To visuaiize the transcript for the putative CF 
gene. HNA gel blot hybridization experiments were 

performed with the 10-1 onwa 

uie iu i cdna as probe. The rwa 

15 hybridization resuits are shown I Figure 8 

UNA samples were prepared from tissue samples 

SUr9lCal "'""^ " " "topsy according 
to methods prevxously described (».„. K immel, s.L. 

20 !!^ ?er ' edS - * n ™** 1S2 - "">• PormaXdehyde 

H I ^ tranSferrea W« membranes (Zetaptot ™- 

BioHad Lab, The membranes were then hybridised wl DMA 
probes labexed to high specific activity by the rand! 
prxming method (A.P. Feinberg and B. VogeXstSn C 

2S PrlfT 6 ' 19e3 ' - Previou,xT;uSed 

663, X988). Figure 8 shows hybridization by the cDNA 
clone xo-i to a 6.51* Script in the tissues 
indited Total rna (io w) of each tissue, and Poly A+ 

30 r 'V* ° f ^ ™ COl ° nio cell Xine were 

30 separated on a i* formaXdehyd. gex The I 

the posrtion of transcripts, sizing was established bv 
companson to standard RNA markers (bbl Labs, 

3S hZn Vr^" ~" J Pi 4^a 

35 human colon cancer cell line. 

6 5 *Tin ly !- S r€>VealS " Pr0minent bind ° f "PP^tely 
6.5 kb i„ a «e « T84 ceixs. simiiar, strong 
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hybridization signals were also detected in pancreas and 
primary cultures of cells from nasal polyps, suggesting 
that the mature mRNA of the putative CF gene is 
approximately 6.5 kb. Minor hybridization signals, 
5 probably representing degradation products, were detected 
at the lower size ranges but they varied between 
different experiments. Identical results were obtained 
with other cDNA clones as probes. Based on the 
hybridization band intensity and comparison with those 
10 detected for other transcripts under identical 

experimental conditions, it was estimated that the 
putative CF transcripts constituted approximately 0.01% 
of total mRNA in T84 cells. 

A number of other tissues were also surveyed by RNA 
15 gel blot hybridization analysis in an attempt to 

correlate the expression pattern of the 10-1 gene and the 
pathology of CF. As shown in Figure 8, transcripts, all 
of identical size, were found in lung, colon, sweat 
glands (cultured epithelial cells) , placenta, liver, and 
20 parotid gland but the signal intensities in these tissues 
varied among different preparations and were generally 
weaker than that detected in the pancreas and nasal 
polyps. Intensity varied among different preparations, 
for example, hybridization in kidney was not detected in 
25 the preparation shown in Figure 8, but can be discerned 
in subsequent repeated assays. No hybridization signals 
could be discerned in the brain or adrenal gland (Figure 
8), nor in skin fibroblast and lymphoblast cell lines. 

In summary, expression of the CF gene appeared to 
occur in many of the tissues examined, with higher levels 
in those tissues severely affected in CF. While this 
epithelial tissue-specific expression pattern is in good 
agreement with the disease pathology, no significant 
difference has been detected in the amount or size of 
transcripts from CF and control tissues, consistent with 
the assumption that CF mutations are subtle changes at 
the nucleotide level. 



30 



35 



WO 91/10734 



PCT/CA91/00009 



42 

1*2. THE MAJOR CF MUTATION 

Figure 16 shows the DNA sequence at the F508 
deletion. On the left, the reverse complement of the 
sequence from base position 1649-1664 of the normal 
5 sequence (as derived from the cDNA clone T16) . The 
nucleotide sequence is displayed as the output (in 
arbitrary fluorescence intensity units, y-axis) plotted 
against time (x-axis) for each of the 2 photomultiplier 
tubes (PMT/l and #2) of a Dupont Genesis 2000™ DNA 
10 analysis system. The corresponding nucleotide sequence 
is shown underneath. On the right is the same region 
from a mutant sequence (as derived from the cDNA clone 
C16) . Double-stranded plasmid DNA templates were 
prepared by the alkaline lysis procedure. Five y.q of 
15 plasmid DNA and 75 ng of oligonucleotide primer were used 
in each sequencing reaction according to the protocol 
recommended by Dupont except that the annealing was done 
at 45 °C for 30 min and that the elongation/ termination 
step was for 10 min at 42 °C. The unincorporated 
20 fluorescent nucleotides were removed by precipitation of 
the DNA sequencing reaction product with ethanol in the 
presence of 2.5 M ammonium acetate at pH 7.0 and rinsed 
one time with 70% ethanol. The primer used for the T16-1 
sequencing was a specific oligonucleotide 
25 5 ' GTTGGCAT6CTTTGATGACGCTTC3 ' spanning base position 

1708 - 1731 and that for C16-1 was the universal primer 
SK for the Bluescript vector (Stratagene) . 

Figure 17 also shows the DNA sequence around the 
F508 deletion, as determined by manual sequencing. The 
30 normal sequence from base position 1726-1651 (from cDNA 
T16-1) is shown beside the CF sequence (from cDNA C16-1) . 
The left panel shows the sequences from the coding 
strands obtained with the B primer 

( 5 'GTTTTCCTGGATTATGCCTGGCAC3 ' ) and the right panel those 
35 from the opposite strand with the D primer 

(5 'GTTGGCATGCTTTGATGACGCTTC3 ' ) . The brackets indicate 
the three nucleotides in the normal that are absent in CF 
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(arrowheads) . Sequencing was performed as described in 
F. Sanger, S. Nicklen, A. R. Coulsen, Proc. Nat. Acad. 
Sci. U. S. A. 74: 5463 (1977). 

The extensive genetic and physical mapping data have 
5 directed molecular cloning studies to focus on a small 
segment of DNA on chromosome 7. Because of the lack of 
chromosome deletions and rearrangements in CF and the 
lack of a well-developed functional assay for the CF gene 
product, the identification of the CF gene required a 

10 detailed characterization of the locus itself and 
comparison between the CF and normal (N) alleles. 
Random, phenotypically normal, individuals could not be 
included as controls in the comparison due to the high 
frequency of symptomless carriers in the population. As 

15 a result, only parents of CF patients, each of whom by 
definition carries an N and a CF chromosome, were 
suitable for the analysis. Moreover, because of the 
strong allelic association observed between CF and some 
of the closely linked DNA markers, it was necessary to 

20 exclude the possibility that sequence differences 

detected between N and CF were polymorphisms associated 
with the disease locus. 

1*2 IDENTIFICATION OF RFLPs AND FAMILY STXinj ^a 

To determine the relationship of each of the DNA 

25 segments isolated from the chromosome walking and jumping 
experiments to CF, restriction fragment length 
polymorphisms (RFLPs) were identified and used to study 
families where crossover events had previously been 
detected between CF and other flanking DNA markers. As 

30 shown in Figure 14, a total of 18 RFLPs were detected in 
the 500 kb region; 17 of them (from E6 to CE1.0) listed 
in Table 2; some of them correspond to markers previously 
. reported. 

Five of the RFLPs, namely 10-1X.6, T6/20, HI. 3 and 

35 CE1.0, were identified with cDNA and genomic DNA probes 
derived from the putative CF gene. The RFLP data are 
presented in Table 2, with markers in the MET and D7S8 
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regions included for comparison. The physical distances 
between these markers as well as their relationship to 
the MET and D7S8 regions are shown in Figure 14. 
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NOTES PGR TRBT.T! 9. 

(a) The number of N and CF-PI (CF with pancreatic 
insufficiency) chromosomes were derived from the 
parents in the families used in linkage analysis 
[Tsui et al, cold Spring Harbor R ymn. nn^ n*-, 
51:325 (1986)]. 

(b) Standardized association (A) , which is less 
influenced by the fluctuation of DNA marker allele 
distribution among the N chromosomes, is used here 
for the comparison Yule's association coefficient 
A=(ad-bc)/(ad+bc), where a, b, c, and d are the 
number of N chromosomes with DMA marker allele l, CF 
with 1, N with 2, and CF with 2 respectively. 

15 Relative risk can be calculated using the 

relationship RR = (1+A)/(1-A) or its reverse. 



10 



20 



(C) 



Allelic association (*), calculated according to A. 
Chakravarti et al, Am. J. Hum. f T » n? ti 36:1239, 
(1984) assuming the frequency of 0.02 for CF 
chromosomes in the population is included for 
comparison. 



Because of the small number of recombinant families 
25 available for the analysis, as was expected from the 
close distance between the markers studied and CF, and 
the possibility of misdiagnosis, alternative approaches 
were necessary in further fine mapping of the CF gene. 
IsA ALLELIC AS SOCIATION 

Allelic association (linkage disequilibrium) has 
been detected for many closely linked DNA markers. While 
the utility of using allelic association for measuring 
genetic distance is uncertain, an overall correlation has 
been observed between CF and the flanking DNA markers, a 
strong association with CF was noted for the closer DNA 
markers, D7S23 and D7S122, whereas little or no 
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association was detected for the more distant markers 
MET, D7S8 or D7S424 (see Figure l) . 

As shown in Table 2, the degree of association 
between DNA markers and CF (as measured by the Yule's 
association coefficient) increased from 0.35 for metH and 
0.17 for J32 to 0.91 for 10-1X.6 (only CF-Pi patient 
families were used in the analysis as they appeared to be 
genetically more homogeneous than CF-PS) . The 
association coefficients appeared to be rather constant 
over the 300 kb from EG1.4 to HI. 3; the fluctuation 
detected at several locations, most notably at H2.3A, 
E4.1 and T6/20, were probably due to the variation in the 
allelic distribution among the N chromosomes (see Table * 
2) . These data are therefore consistent with the result 
from the study of recombinant families (see Figure 14) . 
A similar conclusion could also be made by inspection of 
the extended DMA marker haplotypes associated with the CF 
chromosomes (see below) . However, the strong allelic 
association detected over the large physical distance 
between EG1.4 and Hi. 3 did not allow further refined 
mapping of the CF gene, since J44 was the last genomic 
DNA clone isolated by chromosome walking and jumping 
before a cDNA clone was identified, the strong allelic 
association detected for the JG2E1-J44 interval prompted 
us to search for candidate gene sequences over this 
entire interval. It is of interest to note that the 
highest degree of allelic association was, in fact, 
detected between CF and the 2 RFLPs detected by io-ix.6, 
a region near the major CF mutation. 

Table 3 shows pairwise allelic association between 
DNA markers closely linked to CF. The average number of 
chromosomes used in these calculations was 75-80 and only 
chromosomes from CF-PI families were used in scoring CF 
chromosomes, similar results were obtained when Yule's 
35 standardized association (A) was used. 
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Strong allelic association was also detected among 
subgroups of RFLPs on both the CF and N chromosomes. As 
shown in Table 3, the DNA markers that are physically 
close to each other generally appeared to have strong 
association with each other. For example, strong (in 
some cases almost complete) allelic association was 
detected between adjacent markers E6 and E7, between 
PH131 and W3D1.4 between the Accl and Haelli polymorphic 
sites detected by 10-1X.6 and amongst EG1.4, JG2E1, 
E2.6(E.9), E2.8 and E4.1. The two groups of distal 
markers in the MET and D7S8 region also showed some 
degree of linkage disequilibrium among themselves but 
they showed little association with markers from E6 to 
CEl.o, consistent with the distant locations for MET and 
D7S8. on the other hand, the lack of association between 
DNA markers that are physically close may indicate the 
presence of recombination hot spots. Examples of these 
potential hot spots are the region between E7 and pH13l, 
around H2.3A, between J44 and the regions covered by the 
probes 10-1X.6 and T6/20 (see Figure 14). These regions, 
containing frequent recombination breakpoints, were 
useful in the subsequent analysis of extended haplotype 
data for the CF region. 
3t£ HAPLOTYPE ANALYSTS 

Extended haplotypes based on 23 DNA markers were 
generated for the CF and N chromosomes in the collection 
of families previously used for linkage analysis. 
Assuming recombination between chromosomes of different 
haplotypes, it was possible to construct several lineages 
of the observed CF chromosomes and, also, to predict the 
location of the disease locus. 

To obtain further information useful for 
understanding the nature of different CF mutations, the 
F508 deletion data were correlated with the extended DNA 
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marker haplotypes. As shown in Table 4, five major 
groups of N and CF haplotypes could be defined by the 
RFLPs within or immediately adjacent to the putative CF 
gene (regions 6-8) . 
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TABLE 4 (continued) 
(a) The extended haplotype data are derived from the CF 
families used in previous linkage studies (see footnote 
(a) of Table 3) with additional CF-PS families collected 
5 subsequently (Kerem et al, Am, J. Genet. 44:827 (1989)). 
The data are shown in groups (regions) to reduce space. 
The regions are assigned primarily according to pairwise 
association data shown in Table 4 with regions 6-8 
spanning the putative CF locus (the F508) deletion is 

10 between regions 6 and 7) . A dash (-) is shown at the 

region where the haplotype has not been determined due to 
incomplete data or inability to establish phase. 
Alternative haplotype assignments are also given where 
date are incomplete. Unclassified includes those 

15 chromosomes with more than 3 unknown assignments. The 
haplotype definitions for each of the 9 regions are: 





Region 1- 


metD 


metD 


metH 






BanI 


TaqI 


XfiSl 


20 


A = 


1 


1 


1 




B - 


2 


1 


2 




C ■ 


1 


1 


2 




D = 


2 


2 


1 




E « 


1 


2 




25 


F «= 


2 


1 


1 




G = 


2 


2 


2 



Region 2- E6 E7 pH131 W3D1.4 

30 laal laal Hinfl Hinain 

A = 1 2 2 2 

B = 2 1 1 l 

C «= 1 2 1 1 

35 D = 2 1 2 2 

E = 2 2 2 1 

F = 2 2 1 1 
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G = 
H = 

Region 3- 

5 

A = 



10 Region 4 

A = 
B = 

15 C = 

D = 
E = 



20 

Region 5 

25 A « 

B = 
C «= 

Region 6- 

30 

A = 
B = 
C = 

35 D = 

£ «= 
F = 
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12 12 
112 2 

H2.3A 
Taal 

1 
2 

EG1.4 E61.4 J62E1 

Binctl Boll Psti 

112 
2 2 1 

2 2 2 

111 
12 1 



E2 . 6 E2 . 8 E4 . 1 

Mspl Ncol MspI 

2 1 2 

12 1 
2 2 2 

J44 10-1X.610-1X.6 

Xfe&I Assl Basin 

12 1 
2 1 2 

112 
12 2 
2 2 2 

2 2 1 
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Region 7- 



T6/20 
MspI 



A = 
B = 



1 
2 



Region 8- 



H1.3 CE 1.0 
Ncol Ndel 



10 



A 
B 
C 
D 



2 
1 
1 
2 



1 
2 
1 
2 



15 Region 9- 



J32 
SacI 



J3.ll 
MspI 



J29 



20 



A 
B 
C 
D 
E 



1 
2 
2 
2 
2 



1 
2 
1 
2 
1 



1 
2 
2 
1 
1 



(b) Number of chromosomes scored in each class: 
25 CF-PI(P) = CP chromosomes from CF-PI patients with 

the F508 deletion; 
CP-PS (F) = CP chromosomes from CF-PS patients with 

the F508 deletion; 
CF-PI = Other CF chromosomes from CF-PI patients; 
30 CF-PS = Other CF chromosomes from CF-PS patients; 

N = Normal chromosomes derived from carrier parents 
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It was apparent that most recombinations between 
haplotypes occurred between regions 1 and 2 and between 
regions 8 and 9, again in good agreement with the 
relatively long physical distance between these regions. 
Other, less frequent, breakpoints were noted between 
short distance intervals and they generally corresponded 
to the hot spots identified by pairwise allelic 
association studies as shown above. It is of interest to 
note that the F508 deletion associated almost exclusively 
with Group l f the most frequent CF haplotype, supporting 
the position that this deletion constitutes the major 
mutation in CF. More important, while the F508 deletion 
was detected in 89% (62/70) of the CF chromosomes with 
the AA haplotype (corresponding to the two regions, 6 and 
7) flanking the deletion, it was not was found in the 14 
N chromosomes within the same group ( x 2 « 47.3, p <10 ^ m 
The F508 deletion was therefore not a seguence 
polymorphism associated with the core of the Group I 
haplotype (see Table 5). 
20 Together, the results of the oligonucleotide 

hybridization study and the haplotype analysis support 
the fact that the gene locus described here is the CF 
gene and that the 3 bp (F508) deletion is the most common 
mutation in CF. 
25 1^6 JBTRON/BXON BODNDAI^JBH 

The entire genomic CF gene includes all of the 
regulatory genetic information as well as intron genetic 
information which is spliced out in the expression of the 
CF gene. Portions of the introns at the intron/exon 
boundaries for the exons of the CF gene are very helpful 
in locating mutations in the CF gene, as they permit PCR 
analysis from genomic DNA. Genomic DNA can be obtained 
from any tissue including leukocytes from blood. Such 
intron information can be employed in PCR analysis for 
purposes of CF screening which will be discussed in more 
detail in a later section. As set out in Figure 18 with 
the headings "Exon 1 through Exon 24", there are portions 
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of the bounding introns in particular those that flank 
the exons which are essential for PCR exon amplification. 

Further assistance in interpreting the information 
of Figure 18 is provided in Figure 21. Genomic DNA 
5 clones containing the coding region of the CFTR gene are 
provided. As is apparent from Figure 21, there are 
considerable gaps between the clones of the exons which 
indicates the gaps in the intron portions between the 
exons of Figure 18. These gaps in the intron portions 

10 are indicated by In Figure 21, the clones were 

mapped using different restriction endonucleases (AccI,A; 
AvaI,W; BamHI,B; BgIII,G; BssHI , Y ; EcoRV,V; FspI,F; 
Hindi, C; HindIII,H; Kpn,K; NcoI,J; PstI f P; PvuII,U; 
SmaI,M; SacI,S; SspI,E; StyI,T; XbaI,X; Xhol,0). In 

15 Figure 21, the exons are represented by boxed regions. 

The open boxes indicate non-coding portions of the exons, 
whereas closed boxes indicate coding portions. The 
probable positions of the exons within the genomic DNA 
are also indicated by their relevative positions. The 

20 arrows above the boxes mark the location of the 

oligonucleotides used as sequencing primers in the PCR 
amplification of the genomic DNA. The numbers provided 
beneath the restriction map represent the size of the 
restriction fragments in kb. 

25 In sequencing the intron portions, it has been 

determined that there are at least 27 exons instead of 
the previously reported 24 exons in applicants' 
aforementioned co-pending applications. Exons 6, 14 and 
17, as previously reported, are found to be in segments 

30 and are now named exons 6a, 6b, exons 14a, 14b and exons 
17a, 17b. 

The intron portions, which have been used in PCR 
amplification, are identified in the following Table 5 
and underlined in Figure 18. The portions identified by 
35 the arrows are selected, but it is understood that other 
portions of the intron sequences are also useful in the 
PCR amplification technique. For example, for exon 10 
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the relevant genetic information which is preferred in 
pcr is noted by reference to the 5' and 3' ends of the 
sequence. The intron section is identified with an 
Hence in Table 5 for exon 2, the preferred portions are 
5 identified by 2i-5 and 2i-3 and similarly for exons 3 
through 24. For exon l f the selected portions include 
the sequence GGA...AAA for B115-B and ACA...6T6 for 10D. 
For exon 13, portions are identified by two sets: l3i-5 
and ci-im and X13B-5 and 13i-3A. (This exon (13) is 
10 large and most practical to be completed in two 

sections). C1-1M and X13B-5 are from exon sequences. 
The specific conditions for PCR amplification of 
indivisual exons are summarized in the following Table 6 
and are discussed in more detail hereinafter with respect 
to the procedure explained in R.K. Saiki et al, Science. 
230:1350 (1985). 

These oligonucleotides, as derived from the intron 
sequence, assist in amplifying by PCR the respective 
exon, thereby providing for analysis for DNA sequence 
alterations corresponding to mutations of the CF gene. 
The mutations can be revealed by either direct sequence 
determination of the PCR products or sequencing the 
products cloned in plasmid vectors. The amplified exon 
can also be analyzed by use of gel electrophoresis in the 
25 manner to be further described. It has been found that 
the sections of the intron for each respective exon are 
of sufficient length to work particularly well with PCR 
technique to provide for amplification of the relevant 
exon. 
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TABLE 5 



Oligonucleotides used for amplification or CP gene exons by PCR 
Exon PCR primers: 5'-> 3' Amplified product (bp) 

1 . GG AOTTCACTCACCTAAAJB 1 1 5-B) 933 

2 CCAAATCTGTATGGAGACCA 378 
TATGTTGCCCAGOCTQOTAT (21-3) 

3 CnTXKKnTAATCTCCTTGGA (3i-5) 309 
ATTCACCAGATTTCGTAGTC (3i-3) 

4 TCACATATGGTATGACCCTC (4i«5) 438 
TTGTACCAGCTCACTACCTA (4i-3) 

5 ATTTCTGCCTAG ATGCTGGG (5i-5) 395 
AACTOCOCCrnCCACTICT (5i-3) 

6a TTAGTGTGCTCAG AACCACG (6AI-S) 38S 

CTATGCATAGAGCAGTCCTG (6A1-3) 
6b TGGAATGAGTCTGTACAGCG (6Ci-5) 417 

GAGGTGGAAGTCTACCATGA (60-3) 

7 AGACCATGCTCAGATCTTCCAT (7i-5) 410 
GCAAAGTTCATTAGAACrGATC (71-3) 

8 TGAATCCTAGTGCTTGGCAA (8i-5) 359 
TOGCCATTAGGATGAAATCC (8i-3) 

9 TAATGGATCATGGGCCATGT (9i*5) 560 
ACAGTCTTGAATGTGGTGCA (9i-3) 

10 GCAGAGTAOCTGAAACAGGA (10i-5) 491 
CATTCACAGTAGCTTACCCA (10i-3) 

11 CAACTOTGOTTAAAGCAATAGTOT (111-5) 425 
GCACAGATICrGAGTAACCATAAT (lli-3) 

12 GTGAATCOATGTCX»TOACCA(12i-S) 426 
CTGGTTTAGCATOAGGOGGT (12i-3) 

13 (a) TGCrAAAATAOGAOACATATTGCA (131-5) 528 

ATCTGGTACTAAGGACAG (Cl-MQ 
(b) TC^ATCCAATCAACTCTATACGAA (X13B-5) 497 

TACAOCTTATOCTAATGCTATGAT (131-3A) 
14a AAAAGGTATGOCACIGTTAAG (14 Ai>5) 511 

GTATACATOCCCAAACTATCT (14AM) 
14b GAACACCTAGTACAGCTGCT (14Bi-5) 449 

AACTCCTGGGCTCAAGTGAT (14BI-3) 

15 GTGCATGCTCTTCTAATGCA (151*5) 485 
AAGGCACATQOCTCTGTGCA (15i-3) 

16 CAGAGAAATTGGTCDTTACT (I6i-5) 570 
ATCTAAATGTOGGATTGCCT (161-3) 

17a CAATOTGCACATGTACCCTA (17A1-5) 579 

TGTACACCAACTGTGGTAAG (17Ai-3) 
17b TTCAAAGAATGGCACCAGTGT (17BU5) 463 

ATAACCTATAOAATGCAGCA (17BI-3) 

18 GTAGATGCTGTGATOAACTG (18i-5) 451 
AGTGGCTATCTATGAGAAOG (181-3) 

19 GCCCGACAAATAAOCAAGTGA (19i-5) 454 
GCTAACACATTGCTTCAGGCT (191-3) 

20 GGTCAGGATTGAAAGTGTGCA(20i-5) 473 
CTATCAGAAAACTGCACTGGA (20i-3) 

21 AATGTICACAAGGGACTOCA (2H-5) 477 
CAAAAGTAOCTGTTGCTCCA (211-3) 

22 AAACGCVGAGGCICACAAGA (22i-5) 562 
TGTCAOCATO AAGCAGGCAT (221-3) 

23 AGCTG ATTGTGCGTAAOGCT (23i-5) 400 
TAAAGCTGGATGGCTGTATG (23i-3) 

24 GGACACAGCAGTTAAATGTO (241-5) 569 
ACTATTGOCAGGAAGCCATT (241-3) 

suisnwt - 



WO 91/10734 



PCI7CAS>l/00009 



TABLE 6 



Thermal cycle 



Exon 


Buffer* 


Initial 
denaturation 
tineAemp 


Denatutatioa Annealing 
tune/temp dmeAemp 


Extention 


Final 
ex tendon 
time/tentp 


3-5 .6a ,6b, 
7-10, 12. 


A (I J) 


6mu/MC 


30tec/MC 


30 sec/55 C 


lmiiV72C 


7min/72C 


14a, 16, 17b, 
18-24 














1 


B 


6muV94C 


30 sec/94 C 


30 sec/55 C 


15min/72C 7min/72C 


2,11 


B 


6min/94C 


30 sec/94 C 


30 sec/52 C 


lmitV72C 


7min/72C 


13a 


A(1.75) 


6min/94C 


30 sec/94 C 


30 sec/54 C 


15mitV72C 


7 miii/72 C 


13b 


A(l.75) 


6min/94C 


30 sec/34 C 


30 sec/52 C 


15min/72C 


7mifl/72C 


14b 


B 


6min/94C 


30 sec/94 C 


30 sec/56 C 


lmin/72C 


7min/72C 


17a 


MIS) 


6mirV94C 


30 sec/94 C 30 sec/56 C 


lmin/72C 


7min/72C 



<a) Buffer A<1.5): # buffer with L5mMMgCh 
Buffer A(1.75): ^. buffer with 1.75mM Mgd 2 

Buffer B: 67inM'liu>HCIpH«.8. ^mMMgO,, l«^«M(NH«hS04.a67uMEDTA, 
lOmM B-xnenaptoahanol, 170 ug/ml BSA, 10% DMSO. 15 mM of each dNTFs 

-^PufrCir A terrains: to** Trie ^J,3 fefcst} 

5o«ft fce*. 

dNTPs -deoxynucleotide triphosphates 
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3^7 CP MUTATIONS - AISQfi OR ATS 07 

The association of the FS08 deletion with 1 common 
and 1 rare CF haplotype provided f urther insight into the 
number of mutational events that could contribute to the 
5 present patient population. Based on the extensive 

haplotype data, the original chromosome in which the F508 
deletion occurred is likely to carry the haplotype - 
AAAAAAA- (Group la), as defined in Table 4. The other 
Group I CF chromosomes carrying the deletion are probably 

10 recombination products derived from the original 

chromosome. If the CF chromosomes in each haplotype 
group are considered to be derived from the same origin, 
only 3-4 additional mutational events would be predicted 
(see Table 4) • However, since many of the CF chromosomes 

15 in the same group are markedly different from each other, 
further subdivision within each group is possible. As a 
result, a higher number of independent mutational events 
could be considered and the data suggest that at least 7 
additional, putative mutations also contribute to the CF- 

20 PI phenotype (see Table 3). The mutations leading to the 
CF-PS subgroup are probably more heterogeneous. 

The 7 additional CF-PI mutations are represented by 
the haplo types: -CAAAAAA- (Group lb), -CABCAAD- (Group 
Ic) , BBBAC- (Group Ha) , -CABBBAB- (Group Va) . 

25 Although the molecular defect in each of these mutations 
has yet to be defined, it is clear that none of these 
mutations severely affect the region corresponding to the 
oligonucleotide binding sites used in the 
PCR/hybridization experiment. 

30 one CF chromosome hydridizing to the AF508-ASO 

probe, however, has been found to associate with a 
different haplotype (group Ilia) . It appeared that the 
AF508 should have occurred in both haplotypes, but with 
the discovery of AI507, it is discovered that it is not. 

35 Instead, the AF508 is in group la, whereas the AI507 is 
in group Ilia. None of the other CF nor the normal 
chromosomes of this haplotype group (Ilia) have shown 
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hybridization to the mutant (AF508) ASO [B. Kerem et al, 
Science 245:1073 (1989)]. In view of the group la and 
Ilia haplotypes being distinctly different from each 
other, the mutations harbored by these two groups of CF 
5 chromosomes must have originated independently. To 

investigate the molecular nature of the mutation in this 
group Ilia CF chromosome, we further characterized the 
region of interest through amplification of the genomic 
DNA from an individual carrying the chromosome Ilia by 

10 the polymerase chain reaction (PCR) . 

These polymerase chains reactions (PCR) were 
performed according to the procedure of R.K. Saiki et al 
science 230:1350 (1985). A specific DNA segment of 491 
bp including exon 10 of the CF gene was amplified with 

15 the use of the oligonucleotide primers l0i-5 (5'- 
GCAGAGTACCTGAAACAGGA-3 ' ) and 10i-3 

( 5 ' CATTCACAGTAGCTTACCCA-3 ' ) located in the 5' and 3' 

flanking regions, respectively, as shown in Figure 18 and 
itemzied in Table 5. Both oligonucleotides were 

20 purchased from the HSC DNA Biotechnology Service Center 
(Toronto). Approximately 500 ng of genomic DNA from 
cultured lymphoblastoid cell lines of the parents and the 
CF child of Family 5 were used in each reaction. The DNA 
samples were denatured at 94 °C for 30 sec, primers 

25 annealed at 55°C for 30 sec, and extended at 72°C for 50 
sec. (with 0.5 unit of Taq polymerase, Perkin- 
Elmer/Cetus, Norwalk, CT) for 30 cycles and a final 
extension period of 7 min. in a Perkin-Elmer/Cetus DNA 
Thermal Cycler. Reaction conditions for PCR 

30 amplification of other exons are set out in Table 6. 

Hydridization analysis of the PCR products from 
three individuals of Family 5 of group Ilia was 
performed. The carrier mother and father are represented 
by a half -filled circle and square, respectively, and the 

35 affected son is a filled square in Figure 19a. The 
conditions for hybridizaton and washing have been 
previously described (Kerem et al, supra ^ . There is a 
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relatively weak signal in the father's PCR product with 
the mutant (oligo AF508) probe. In Figure 19b, DNA 
sequence analysis of the clone 5-3-15 and the PCR 
products from the affected son and the carrier father are 
5 shown. The arrow in the center panel indicates the 

presence of both A and T nucleotide residue in the same 
position; the arrow in the right panel indicates the 
points of divergence between the normal and the AI507 
sequence. The sequence ladders shown are derived from 

10 the reverse-complements as will be described later. 
Figure 19c shows the DNA sequences and their 
corresponding amino acid sequences of the normal, AI507, 
and AF508 alleles spanning the mutation sites are shown. 
With reference to Figure 19a, the PCR-amplif ied DNA from 

15 the carrier father, who contributed the group Ilia CF 
chromosome to the affected son, hybridized less 
efficiently with the AF508 ASO than that from the mother 
who carried the group la CF chromosome. The difference 
became apparent when the hybridization signals were 

20 compared to that with the normal ASO probe. This result 
therefore indicated that the mutation carried by the 
group Ilia CF chromosome might not be identical to AF508. 

To define the nucleotide sequence corresponding to 
the mutant allele on this chromosome, the FCR-amplified 

25 product of the father's DNA was excised from a 

polyacrylamide-electrophoretic gel and cloned into a 
sequencing vector. 

The general procedures for DNA isolation and 
purification for purposes of cloning into a sequencing 

30 vector are described in J. Sambrook, E.F. Fritsch, T. 

Maniatis, Molecular Cloning; A Laboratory Manual 2nd ed. 
(Cold Spring Harbor Press, N.Y. 1989). The two 
homoduplexes generated by PCR amplification of the 
paternal DNA were purified from a 5% non-denaturing 

35 polyacrylamide gel (30:1 acrylamide:bis-acrylamide) . The 
appropriate bands were visualized by staining with 
ethidium bromide, excised and eluted in TE (10 mM Tris- 
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HC1; IBM EDTA; pH 7.5) for 2 to 12 hours at room 
temperature. The DNA solution vas sequentially treated 
with Tris-equilibrated phenol, phenol/CHCl 3 and CHC1,. 
The DNA samples were concentrated by precipitation in 
5 ethanol and resuspension in TE, incubated with T4 

polynucleotide kinase in the presence of ATP, and ligated 
into diphosphorylated, blunt-ended Bluescript KS~ vector 
{Stratagene, San Diego, CA) . Clones containing amplified 
product generated from the normal parental chromosome 
10 were identified by hybridization with the oligonucleotide 
N as described in Kerem et al supra . 

Clones containing the mutant sequence were 
identified by their failure to hybridize to the normal 
ASO (Kerem et al, fiujaa) . one clone, 5-3-15 was isolated 
15 and its DMA sequence determined. The general protocol 
for sequencing cloned DNA is essentially as described 
[J.R. Riordan et al, Soignee. 245:1066 (1989)] with the 
use of an U.S. Biochemicals Sequenase" kit. To verify the 
sequence and to exclude any errors introduced by DNA 
polymerase during PCR, the DNA sequences for the PCR 
products from the father and one of the affected children 
were also determined directly without cloning. 

This procedure was accomplished by denaturing 2 
pmoles of gel-purified double-stranded PCR product in 0.2 
M NaOH/0.2 mM EDTA (5 min. at room temperature), 
neutralized by adding o.l volume of 2 M ammonium acetate 
(pH 5.4) and precipitated with 2.5 volumes of ethanol 
at -70-C for 10 min. After washing with 70% ethanol, the 
DNA pellet was dried and redissolved in a sequencing 
reaction buffer containing 4 pmoles of the 
oligonucleotide primer lOi-3 of Figure 18, dithiothreitol 
(8.3 mM) and [a-35S]-dATP (0.8 M M, lOOO Ci/mmole). The 
mixture was incubated at 37«c for 20 min., following 
which 2 M l of labelling mix, as included in the 
35 Seguenase- Kit and then 2 units of Sequenase enzyme were 
added. Aliquotes of the reaction mixture (3.5 nl) were 
transferred, without delay, to tubes each containing 2.5 
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Ml of ddGTP, ddATP, ddTTP and ddCTP solutions (U.S. 
Biochemicals Sequenase kit) and the reactions were 
stopped by addition of the stop solution. 

The DNA sequence for this mutant allele is shown in 
5 Figure 19b. The data derived from the cloned DNA and 
direct sequencing of the PCR products of the affected 
child and the father are all consistent with a 3 bp 
deletion when compared to the normal sequence (Figure 
19c). The deletion of this 3 bp (ATC) at the 1506 or 

10 1507 position results in the loss of an isoleucine 
residue from the putative CFTR, within the same ATP- 
binding domain where AF508 resides, but it is not evident 
whether this deleted amino acid corresponds to the 
position 506 or 507. since the 506 and 507 positions are 

15 repeats, it is at present impossible to determine in 

which position the 3 bp deletion occurs. For convenience 
in later discussions, however, we refer to this deletion 
as AI507. 

The fact that the AI507 and AF508 mutations occur in 

20 the same region of the presumptive ATP-binding domain of 
CFTR is surprising. Although the entire sequence of 
AZ507 allele has not been examined, as has been done for 
AF508, the strategic location of the deletion argues that 
it is the responsible mutation for this allele. This 

25 argument is further supported by the observation that 
this alteration was not detected in any of the normal 
chromosomes studied to date (Kerem et al, supra ) . The 
identification of a second single amino acid deletion in 
the ATP-binding domain of CFTR also provides information 

30 about the structure and function of this protein, since 
deletion of either the phenylalanine residue at position 
508 or isoleucine at position AI507 is sufficient to 
affect the function of CFTR such that it causes CF 
disease, it is suggested that these residues are involved 

35 in the folding of the protein but not directly in the 
binding of ATP. That is, the length of the peptide is 
probably more important than the actual amino acid 
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residues in this region. In support of this hypothesis, 
it has been found that the phenylalanine residue can be 
replaced by a serine and that isoleucine at position 506 
with valine, without apparent loss of function of CFTR. 
5 When the nucleotide sequence of AI507 is compared to 

that of AF508 at the ASO-hybridizing region, it was noted 
that the difference between the two alleles was only an A 
- T change (Figure 19c) . This subtle difference thus 
explained the cross-hybridization of the AF508-ASO to 
10 AI507. These results therefore exemplified the 

importance of careful examination of both parental 
chromosomes in performing ASO-baeed genetic diagnosis. 
It has been determined that the AF508 and AI507 mutations 
can be distinguished by increasing the stringency of 
15 oligonucleotide hybridization condition or by detecting 
the unique mobility of the heteroduplexes formed' between 
each of these sequences and the normal DMA on a 
polyacrylaminde gel. The stringency of hybridization can 
be increased by using a washing temperature at 45°c 
20 instead of the prior 39 »C in the presence of 2XSSC (1XSSC 
■ 150 mM NaCL and 15 mH Na citrate) . 

Identification of the AI507 and AF508 alleles by 
polyacrylamide gel electrophoresis is shown in Figure 20. 
The pcr products were prepared from the three family 
25 members and separated on a 5% polyacrylamide gel as 

described above. A 0NA sample from a known heterozygous 
AF508 carrier is included for comparison. With reference 
to Figure 20, the banding pattern of the PCR-amplif ied 
genomic DMA from the father, who is the carrier of AI507, 
30 is clearly distinguishable from that of the mother, who 
is of the type of carriers with the AF508 mutation. In 
this gel electrophoresis test, there were actually three 
individuals (the carrier father and the two affected sons 
in Family 5) who carried the AI507 deletion, since they 
35 all belong to the same family, they only represent one 

single CF chromosome in our population analysis [Kerem et 
al, supra ] The two patients who also inherited the AF508 
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nutation from their mother shoved typical symptoms of CF 
with pancreatic insufficiency. The father of this family 
was the only parent who carries this AI507 mutation; no 
other CF parents shoved reduced hybridization intensity 
5 signal with the AF508 mutant oligonucleotide probe or a 
peculiar heteroduplex pattern for the PCR product (as 
defined above) in the retrospective study. In addition, 
two representatives of the group Illb and one of the 
group IIIc CF chromosomes from our collection [Kerem et 

10 al, supra ] vere sequenced, but none were found to contain 
AI507. Since the electrophoresis technique eliminates 
the need for probe-labelling and hybridization, it may 
prove to be the method of choice for detecting carriers 
in a large population scale [J. M. Rommens et al, Am. J. 

15 Hum, Genet . 46:395-396 (1990)]. 

The present data also indicate that there is a 
strict correlation between DNA marker haplotype and 
mutation in CF* The AF508 deletion is the most common CF 
mutation that occurred on a group la chromosome 

20 background [Kerem et al, supra 1 . The AI507 mutation is, 
however, rare in the CF population; the one group Ilia CF 
chromosome carrying this deletion is the only example in 
our studied population (1/219) . Since the group III 
haplotype is relatively common among the normal 

25 chromosomes (17/198) , the AI507 deletion probably 
occurred recently. Additional studies with larger 
populations of different geographic and ethnic 
backgrounds should provide further insight in 
understanding the origins of these mutations. 

30 1^8 WPITIPPP CF HVTETIQNfi 

Following the above procedures, other mutations in 
the CF gene have been identified. The following brief 
description of each identified mutation is based on the 
previously described procedures for locating the mutation 

35 involving use of PCR procedures. The mutations are given 
short form names. The numbering used in these 
abbreviations refers to either the DNA sequence or the 
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amino acid sequence position of the mutation depending on 
the type of mutation. For example, splice mutations and 
frameshift mutations are defined using the DNA sequence 
position. Most other mutations derive their nomenclature 
5 from the amino acid residue position. The description of 
each mutation clarifies the nomenclature in any event. 

For example, mutations G542X, Q493X, 3659 del c, 556 
del A result in shortened polypeptides significantly 
different from the single amino acid deletions or 

10 alteration. 6542X and Q493X involve a polypeptide 

including on the first 541 and 493 amino acid residues, 
respectively, of the normal 1480 amino acid polypeptide. 
3659 del c and 556 del A also involve shortened versions 
and will include additional amino acid residues. 

15 Mutation 711+iG - T and 1717-1G - A are predicted to lead 
to polypeptides which cannot be as of yet exactly 
defined. They probably do lead to shortened polypeptides 
but could contain additional amino acids. DMA sequence 
encoding these mutant polypeptides will now probe ly 

20 contain intrpn sequence from the normal gene or possible 
deleted exons. 

3 r9fP MOTATIOHB TH ETON | 

In the 129G - c mutation, there is a single basepair 
change of G to C at nucleotide 129 of the cDNA sequence 
25 of Figure l. The PCR product for amplifying genomic DNA 
containing this mutation is derived from the B115-B and 
10D primers as set out in Table 5. The genomic DNA is 
amplified as per the conditions of Table 6. 

2±£*JL MDTATIOM8 Xtt BXOM a 

30 The G85E mutation in exon 3 involves a G to A 

transition at nucleotide position 386. It is detected in 
family #26, a French Canadian family classified as PI. 
This predicted Gly to Glu amino acid change is associated 
with a group lib haplotype. The mutation destroys a 

35 Hinfl site. The pcr product derived from the 3i-5 and 

3i-3 primers, as per conditions of Table 6, is cleaved by 
this enzyme into 3 fragment, 172, 105 and 32 bp, 
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respectively, for the normal sequence; a fragment of 277 
bp would be present for the mutant sequence. We analyzed 
54 CF chromosomes, 8 from group II, and 50 normal 
chromosomes, 44 from group II, and did not find another 
5 example of G85E. 

3*8*2 MUTATIONS IN ETON 4 

556 del A is a frameshift mutation in exon 4 in a 
single CF chromosome (Toronto family #17, GM1076) . There 
is a deletion of A at nucleotide position 556. This 

10 mutation is associated with Group Illb haplotype and is 
not found in 31 other CF chromosomes (9 from Illb) and 30 
M chromosomes (16 from Illb) . The muation creates a Bgll 
1 enzyme cleavage site. The PCR primers are 4i-5 and 4i- 
3 (see Table 5) where the enzyme cuts the mutant PCR 

15 product (437 bp) into 2 fragments of 287 and 150 bp in 
size. 

The 114 8T mutation in exon 4 involves a T to c 
basepair transition at nucleotide position 575. This 
results in an lie to Thr change at amino acid position 
20 148 of Figure 1. The PCR product used in amplifying 
genomic DNA containing this mutation uses primers 4i-5 
and 4i-3 as set out in Table 5. The reaction conditions 
for amplyfing the genomic DNA are set out in Table 6. 

3fPt? MUTATIONS IK ETON 5 

25 In mutation G178R the Gly to Arg missense mutation 

in exon 5 is due to a G to A change at nucleotide 
position 664. The mutation is found on the mother's CF 
chromosome in family #50; the other mutation in this 
family is AF508. Primers 5i-5 and 5i-3 were used for 

30 amplifying genomic DNA as outlined in Tables 5 and 6. 
3*8.4 MUTATIONS IN EXON 9 

A mutation in exon 9 is a change of alanine (GCG) to 
glutamic acid (GAG) at amino acid position 455 
(A455 E) . Two of the 38 non-AF508 CF chromosomes 
35 examined carries this mutation; both of them are from 
patients of a French-Canadian origin , which we have 
identified in our work as families #27 and #53, and they 
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belong to haplotype group lb. The mutation is detectable 
by allele-specific oligonucleotide (ASO) hybridization 
with PCR-amplified genomic DNA sequence. The PCR primers 
are 91-5 ( 5 ' -TAATGGATCATGGGCCATGT-3 ' ) and 9i-3 (5'- 
5 ACAGTGTTGAATGTGGTGCA-3 ' ) for amplifying genomic DMA under 
the conditions of Table 6. The ASOs are 5'- 
GTTGTTGGCGGTTGCT-3' for the normal allele and 5'- 
GTTGTTGGAGGTTGCT-3 ' for the mutant. The oligonucleotide 
hybridization is as described in Kerem et al (1989) supra 
at 37»c and the washings are done twice with 5XSSC for 10 
min each at room temperature followed by twice with 2 X 
SSC for 30 min each at 52 °C. Although the alanine at 
position 455 (Ala455) is not present in all ATP-binding 
folds across species, it is present in all known members 
15 of the P-glycoprotein family, the protein most similar to 
CPTR. Further, A455 - E is believed to be a mutation 
rather than a sequence polymorphism because the change is 
not found in 16 non-AF508 CF chromosomes and three normal 
chromosomes carrying the same group I haplotype. 
20 MUTATIONS IW EXOW in 

In the Q493X mutation Gln493 (GAG) is changed into a 
stop codon (TAG) in Toronto family #9 (nucleotide 
position 1609 c T) . The muation occurs on a CF 
chromosome with haplotype Illb; it is not found in 28 
25 normal chromosomes (15 of which belong to lib) nor in 33 
other CF chromosomes (5 of which Illb) . The mutation can 
be detected by allele-specific PCR, with loi-5 as the 
common PCR primer, 5 ' -GGCATAATCCAGGAAAACTG-3 ' for the 
normal sequence and 5 ' -GGCATAATCCAGGAAAACTA-3 ' for the 
30 mutant allele. The PCR condition is 6 min at 94 • 

followed by cycles of 30 sec at 94«, 30 sec at 57° and 90 
sec at 72% with 100 ng of each primer and -400 ng 
genomic DNA. The primers 9i-3 and 9i-5 may be used for 
internal PCR control as they share the same reaction 
35 condition. 

lilii HUTATION8 IK EXOK fr fr 
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In mutation 6542X the glycine codon (GGA) at amino 
acid position 542 is changed to a stop codon (TGA) (G542 
-+ Stop) . The single chromosome carrying this mutation is 
of Ashkenazic Jewish origin (family A) and has the B 
5 haplotype (XV2C allele 1; KM. 19 allele 2). The mutant 
sequence can be detected by hybridization analysis with 
allele-specif ic oligonucleotides (ASOs) on genomic ONA 
amplified under conditions of Table 6 by PCR with the 
lli-5 and lli-3 oligonucleotide primers. The normal ASO 

10 is 5 9 -ACCTTCTCCAAGAACT-3 9 and the mutant ASO, 5'- 

ACCTTCTCAAAGAACT-3 ' . The oligonucleotide hybridization 
condition is as described in Kerem et al (1989) supra and 
the washing conditions are twice in 5 x SSC for 10 min. 
each at room temperature followed by twice in 2 X SSC for 

15 30 min. each at 45 °C. The mutation is not detected in 52 
other non-AF508 CF chromosomes, 11 of which are of Jewish 
origin (three have a B haplotype) , nor in 13 normal 
chromosomes • 

In mutation S549R, the highly conserved serine 

20 residue of the nucleotide binding domain at position 549 
is changed to arginine (S549 - R) ; the codon change is 
AGT -> AGG. The CF chromosome with this mutation is 
carried by a non-Ashkenazic Jewish pateitn from Morocco 
(family B) . The chromosome also has the B haplotype. 

25 Detection of this mutation may be achieved by ASO 
hybridization or allele-specif ic PCR. In the ASO 
hybridization procedure, the genomic DNA sequence is 
first amplified under conditions of Table 6 by PCR with 
the lli-5 and lli-3 oligonucleotides; the ASO for the 

30 normal sequence is 5 ' -ACACTGAGTGGAGGTC-3 ' and that for 
the mutant is 5 ' - ACACTGAGGGGAGGTC • The oligonucleotide 
hybridization condition is as described by Kerem et al 
(1989) supra and the washings are done twice in 5 x SSC 
for 10 min. each at room temperature followed by twice in 

35 2 x SSC for 30 min. eachat 56°C. In the allele-specif ic 
PCR amplification, the oligonucleotide primer for the 
normal sequence is 5'TGCTCGTTGACCTCCA-3' , that for the 
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mutant is 5'TGCTCGTTGACCTCCC-3' and that for the common, 
outside sequence is lli-5. The reaction is performed 
with 500 ng of genomic DHA, 100 ng of each of the 
oligonucleotides and 0.5 unit of Taq polymerase. The DNA 
5 template is first denatured by heating at 94 °C for 6 

min. , followed by 30 cycles of 94° for 30 sec, 55° for 30 
sec and 72° for 60 sec. The reaction is completed by a 6 
min heating at 72° for 7 min. This S549 R mutation is 
not present in 52 other non-AF508 CF chromosomes, 11 of 

10 which are of Jewish origin (three have a B hap lo type) , 
nor in 13 normal chromosomes. 

In the S549I mutation there is an AGT-*ATT change 
(nucleotide position 1778 G-*T) which represent the third 
mutation involving this amino acid codon resulting in a 

15 loss of the Ddel site. We have only one example who is 
of Arabic origin and is sequenced; no other Ddel- 
resistant chromosome is found in 5 other Arabic CF, 21 
Jewish CF, 41 Canadian CF, and 13 Canadian normal 
chromosomes. 

20 m mutation R560T the arginine (AAG) at amino acid 

position 560 is changed to threonine (AAC) . The 
individual carrying this mutation (R560 -» T) is from a 
family we have identified in our work as family #32 and 
the chromosome is marked by haplotype Illb. The mutation 

25 creates a Maell site which cleaves the PCR product of 
exon 11 (generated with primers lli-5 and lli-3 under 
conditions of Table 6) into two fragments of 214 and 204 
bp in size. None of the 36 non-AF508 CF chromosomes 
(seven of which have haplotype Illb) or 23 normal 

30 chromosomes (16 have haplotype Illb) carried this 

sequence alteration. The R560 -» T mutation is also not 
present on eight CF chromosomes with the AF508 mutation. 

In mutation G551D glycine (G) at amino acid position 
551 is changed to aspartic acid (D) . G551 is a highly 

35 conserved residue within the ATP-binding fold. The 
corresponding codon change is from GGT to GAT. The 
G551-D change is found in 2 of our families (/I, #38) 
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with pancreatic insufficient (PI) CF patients and 1 
family (#54) with a pancreatic sufficient (PS) patient. 
The other CF chromosomes in family #1 and #38 carry the 
AF508 mutation and that in family #54 is unknown. Based 
5 on our "severe and mild mutation" hypothesis (Kerem et 
al. 1989), this mutation is expected to be a "severe" 
one. All 3 chromosomes carrying this mutation belong to 
Group Illb. This G551-D substitution does not represent 
a sequence polymorphism because the change is not 
10 detected in 35 other CF chromosomes without the &F508 
deletion (5 of them from group IXIb) and 19 normal 
chromosomes (including 5 from group Illb) . To detect 
this mutation, the genomic DNA region may be amplified 
under conditions of Table 6 by PCR with primers ni-5 

15 ( 5 ' -CAACTGTGGTTAAAGCAATAGTGT-3 ' ) and lli-3 (5'- 
GCACAGATTCTGAGTAACCATAAT-3 ' ) and examined for the 
presence of a Mbol (Sau3A) site created by nucleotide 
change; the uncut (normal) form is 419 bp in length and 
the digestion products (from the mutant form) are 241 and 

20 178 bp. 

JLt&*Z MUTATIONS IN BXOK 12 

In the Y563N mutation a T to A change is detected at 
nucleotide position 1820 in exon 12. This switch would 
result in a change from Tyr to Asn at amino acid position 

25 563. It is found in a single family with 2 PS patients 
but the mutation in the other chromosome is unknown. We 
think Y563N is probably a missense mutation because (l) 
the T to A change is not found in 59 other CF 
chromosomes, with 8 having the same haplotype (Ila) and 

30 30 having AF508; and (2) this alteration is not found in 
54 normal chromosomes, with 39 having the lla haploytype. 
Unfortunately, the amino acid change is not drastic 
enough to permit a strong argument. This putative 
mutation can be detected by ASO hybridization with a 

35 normal (5 ' -AGCAGTATACAAAGATGC-3 ' ) and a mutant (5'- 
AGCAGTAAACAAAGATGC-3 ' ) oligonucleotide probe. The 
washing condition is 54 °C with 2xSSC. 
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In the P574H mutation the c at nucleotide position 
1853 is changed to A. Although the amino acid Pro at 
this position is not highly conserved across different 
ATP-binding folds, c change to His could be a drastic 
5 substitution. This change is not detected in 52 other CF 
chromosomes nor 15 normal chromosomes, 4 of which have 
the same group IV haplotype. Based on these arguments, 
we believe P574H is a mutation. To detect this putative 
mutation, one may use the following ASOs: 5'- 
10 GACTCTCCTTTTGGA-3 ' for the normal and 5 ' -GACTCTCATTTTGGA- 
3' for the mutant. Washing should be done at 47 • in 
2XSSC. 

In the L1077P mutation, the T at nucleotide position 
3362 is changed to C. This results in a change of the 
15 amino acid Leu to Pro at amino position 1077 in Figure l. 
As with the other mutations in this exon, the genomic DNA 
is amplified by use of the primers of Table 5; namely 
17bi-5 and 17bi-3. The reaction conditions in amplifying 
the genomic DNA are set out in Table 6. 
20 The Y1092X mutation involves a change of c at 

nucleotide position 3408 to A. This would result in 
protein synthesis termination at amino position 1092. 
Hence the amino acid Tyr is not present in the truncated 
polypeptide. As with the above procedures, the primers 
25 used in amplifying this mutation are 17bi-3 and 17bi-3. 
MUTATIONB iw «ro» i« 
3659 del c is a frameshift mutation in exon 19 in a 
single CF chromosome (Toronto family #2); deletion of c 
at nucleotide position 3659 or 3960; haplotype Ha; not 
present in 57 non-AF508 CF chromosomes (7 from Ha) and 
50 N chromosomes (43 from Ila) ; the deletion may be 
detected by PCR with a common oligonucleotide primer 19 i- 
5 (see Table 5) and 2 ASO primers, HSC8 (5»- 
GTATGGTTTGGTTGACTT GG-3 ' ) for the normal and HSC9 (5'- 
35 GTATGGTTTGGTTGACTTGT-3 ' ) for the mutant allele; the PCR 
condition is as usual except the annealing temperature is 
at 60 °C to improve specificity. 
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3.8.9 MUTATIONS IN IHTROM 4 

In the 621 + 1G T mutation there is a single bp 
change affecting the splice site (GT -* TT) at the 3' end 
of exon 4; this nutation is detected in 5 French-Canadian 
5 CF chromosomes (one each in Toronto families #22, 23, 26, 
36 and 53) but not in 33 other CF chromosomes (18 from 
the same group, group I) and 29 N chromosomes (13 from 
group I); the mutation creates a Msel site; genomic DNA 
may be amplified by the 2 intron primers, 4i-5 adn 4i-3, 
10 and cut with Msel to distinguish the normal and mutant 
alleles; the normal would give 4 fragments of 33, 35, 71 
and 298 bp in size; the 298 bp fragment in the mutant is 
cleaved by the enzyme to give a 54 and 244 bp fragments. 

3 t8**0 MUTATIONS I» INTRON 5 

15 in the 711 + ig ■* T mutation this G to T switch 

occurs at the splice junction after exon 5. The mutation 
is found on the mother's CF chromosome in family #22, a 
French Canadian family from Chicoutimi. The other 
mutation in this family is 621+lG -» T. 

20 3 t 8 f U MUTATIONS I B INTRON Ifl 

In the 1717-lG -» A mutation a putative splice 
mutation is found in front of exon 11. This mutation is 
located at the last nucleotide of the intron before exon 
11. The mutation may be detected with the following 

25 ASO's: normal » 5 ' -TTTGGTAATAGGACATCTCC-3 ' ; mutant ASO «= 
5 ' -TTTGGTAATAAGACATCTCC- 3 ' . The washing conditions afer 
hybridization are 5xSSC twice for 10 min at room temp, 
2XSSC twice for 30 min at 47° for the mutant and 2xSSC 
twice to 30 min at 48° for the normal ASO. We have only 

30 l single example from an Arabic patient and there is no 
haploytpe data. The mutation is not found in 5 other 
Arabic, 21 Jewish, and 41 Canadian CF chromosomes, nor in 
13 normal chromosomes. 

2a& PNA SEQUENCE POLYMORPHISMS 
35 Nucleotide position Amino acid change 

1540 (A or G) Met or Val 

1716 (G or A) no change (Glu) 

2694 (T or G) no change (Thr) 



WO 91/10734 



PCI7CA91/00009 



83 

356 (6 or A) Arg or Gin 

A polymorphism is detected at nucleotide position 1540- 
the A residue can be substituted by G, changing the 
corresponding amino acid from Met to Val. At postion 
5 2694- the T residue can be a G; although it does not 
change the encoded amino acid. The polymorphism may be 
detected by restriction enzymes Avail or Sau9GI. These 
changes are present in the normal population and show 
good correlation with haploytpes but not in CP disease. 

10 There can be a G to A change for the last nucleotide 

of exon 10 (nucleotide position 1716) . We think that 
this nucleotide substitution is a sequence polymorphism 
because (a) it does not alter the amino acid, (b) it is 
unlikely to cause a splicing defect and (c) it occurs on 

15 some normal chromosomes, in two Canadian families, this 
rare allele is found associated with haplotype Illb. 

The more common nucleotide at 356 (G) is found to be 
changed to A in the father's normal chromosome in family 
#54. The amino acid changes from Arg to Gin. 

20 CPTR PROTEIN 

As discussed with respect to the DNA sequence of 
Figure 1, analysis of the sequence of the overlapping 
cDNA clones predicted an unprocessed polypeptide of 1480 
amino acids with a molecular mass of 168,138 daltons. As 

25 later described, due to polymorphisms in the protein, the 
molecular weight of the protein can vary due to possible 
substitutions or deletion of certain amino acids. The 
molecular weight will also change due to the addition of 
carbohydrate units to form a glycoprotein. It is also 

30 understood that the functional protein in the cell will 
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be similar to the unprocessed polypeptide/ but may be 
modified due to cell metabolism. 

Accordingly, purified normal CPTR polypeptide is 
characterized by a molecular weight of about 170,000 
daltons and having epithelial cell transmembrane ion 
conductance activity. The normal CPTR polypeptide, which 
is substantially free of other human proteins, is encoded 
by the aforementioned DNA sequences and according to one 
embodiment, that of Figure l. Such polypeptide displays 
the immunological or biological activity of normal cftr 
polypeptide. As will be later discussed, the CFTR 
polypeptide and fragments thereof may be made by chemical 
or enzymatic peptide synthesis or expressed in an 
appropriate cultured cell system. The invention provides 
15 purified 507 mutant CFTR polypeptide which is 

characterized by cystic f ibrosis-associated activity in 
human epithelial cells. Such 507 mutant CFTR 
polypeptide, as substantially free of other human 
proteins, can be encoded by the 507 mutant DNA sequence. 
20 Atl PTIWCTVRB QEjgFJB 

The most characteristic feature of the predicted 
protein is the presence of two repeated motifs, each of 
which consists of a set of amino acid residues capable of 
spanning the membrane several times followed by sequence 
resembling consensus nucleotide (ATP) -binding folds 
(NBFs) (Figures U, 12 and 15). These characteristics 
are remarkably similar to those of the mammalian 
multidrug resistant P-glycoprotein and a number of other 
membrane-associated proteins, thus implying that the 
predicted CF gene product is likely to be involved in the 
transport of substances (ions) across the membrane and is 
probably a member of a membrane protein super family. 

Figure 13 is a schematic model of the predicted CFTR 
protein, m Figure 13, cylinders indicate membrane 
spanning helices, hatched spheres indicate NBFs. The 
stippled sphere is the polar R-domain. The 6 membrane 
spanning helices in each half of the molecule are 
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depicted as cylinders. The inner cytoplasmically 
oriented NBFs are shown as hatched spheres with slots to 
indicate the means of entry by the nucleotide. The large 
polar R-domain which links the two halves is represented 
5 by an stippled sphere, charged individual amino acids 
within the transmembrane segments and on the R-domain 
surface are depicted as small circles containing the 
charge sign. Net charges on the internal and external 
loops joining the membrane cylinders and on regions of 

10 the NBFs are contained in open squares, sites for 

Phosphorylation by protein kinases A or C are shown by 
closed and open triangles respectively. K,R,H,D, and E 
are standard nomenclature for the amino acids, lysine 
arginine, histidine, aspartic acid and glutamic acid ' 

15 respectively. 

Each of the predicted membrane-associated regions of 
the CFTR protein consists of 6 highly hydrophobic 
segments capable of spanning a lipid bilayer according to 
the algorithms of Kyte and Doolittle and of Gamier et al 
20 ( J. Wol, MoT , 120, 97 (1978, (Figure 13). The membrane- 
associated regions are each followed by a large 
hydrophilic region containing the NBFs. Based on 
sequence alignment with other known nucleotide binding 
proteins, each of the putative NBFs in CFTR comprises at 
25 least 150 residues (Figure 13). The 3 bp deletion at 
position 507 as detected in CF patients is located 
between the 2 most highly conserved segments of the first 
NBF in CFTR. The amino acid sequence identity between 
the region surrounding the isoleucine deletion and the 
30 corresponding regions of a number of other proteins 
suggests that this region is of functional importance 
(Figure 15) . a hydrophobic amino acid, usually one with 
an aromatic side chain, is present in most of these 
proteins at the position corresponding to 1507 of the 
35 CFTR protein, it is understood that amino acid 

polymorphisms may exist as a result of DNA polymorphisms. 
Similarly, mutations at the other positions in the 
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protein suggested that corresponding regions of the 
protein 'are also of functional importance. Such 
additional mutations include substitutions of: 

i) Glu for Gly at amino acid position 85; 

ii) Thr for lie at amino acid position 148; 

iii) Arg for Gly at amino acid position 178; 

iv) Glu for ALA at amino position 455; 

v) stop codon for Gin at amino acid post ion 493; 

vi) stop codon for Gly at amino acid position 542; 

vii) Arg for Ser or lie for Ser at amino acid 
position 549; 

viii) Asp for Gly at amino acid position 551; 

ix) Thr for Arg at amino acid position 560; 

x) Asn for Tyr at amino acid position 563; 

xi) His for Pro at amino acid position 574; 

xii) Pro for Leu at amino acid position 1077; 

xiii) stop codon for Tyr at amino acid position 

1092. 

Figure 15 shows alignment of the 3 most conserved 
segments of the extended NBF's of CFTR with comparable 
regions of other proteins. These 3 segments consist of 
residues 433-473, 488-513, and 542-584 of the N-terminal 
half and 1219-1259, 1277-1302, and 1340-1382 of the C- 
terminal half of CFTR. The heavy overlining points out 
the regions of greatest similarity. Additional general 
homology can be seen even without the introduction of 
gaps. 

Despite the overall symmetry in the structure of the 
protein and the sequence conservation of the NBFs, 
sequence homology between the two halves of the predicted 
CFTR protein is modest. This is demonstrated in Figure 
12, where amino acids 1-1480 are represented on each 
axis. Lines on either side of the identity diagonal 
indicate the positions of internal similarities. 
Therefore, while four sets of internal sequence identity 
can be detected as shown in Figure 12, using the Dayhoff 
scoring matrix as applied by Lawrence et al. [C. B. 



SUBSTITUTE SHEET 



WO 91/10734 



PCT/CA91/00009 



87 



Lawrence, D. A. Goldman, and R. T. Hood, Bull Math Rfm , 
48, 569 (1986)], three of these are only apparent at low 
threshold settings for standard deviation. The strongest 
identity is between sequences at the carboxyl ends of the 
5 NBFs. Of the 66 residues aligned 27% are identical and 
another 11% are functionally similar. The overall weak 
internal homology is in contrast to the much higher 
degree (>70%) in P-glycoprotein for which a gene 
duplication hypothesis has been proposed (Gros et al 
10 fisU 47, 371, 1986, C. Chen et al, fiejJL 47, 381, 1986, 
Serlach et al, fiafcuxa, 324, 485, 1986, Gros et al, Moj^. 
cell, Pip], 8, 2770, 1988). The lack of conservation in 
the relative positions of the exon-intron boundaries may 
argue against such a model for CPTR (Figure 2) . 
15 Since there is apparently no signal-peptide sequence 

at the amino-terminus of CFTR, the highly charged 
hydrophilic segment preceding the first transmembrane 
sequence is probably oriented in the cytoplasm. Each of 
the 2 sets of hydrophobic helices are expected to form 3 
transversing loops across the membrane and little 
sequence of the entire protein is expected to be exposed 
to the exterior surface, except the region between 
transmembrane segment 7 and 8. it is of interest to note 
that the latter region contains two potential sites for 
25 N-linked glycosylation. 

Each of the nembrane-associated regions is followed 
by a NBF as indicated above. In addition, a highly 
charged cytoplasmic domain can be identified in the 
middle of the predicted CFTR polypeptide, linking the 2 
30 halves of the protein. This domain, named the R-domain, 
is operationally defined by a single large exon in which 
69 of the 241 amino acids are polar residues arranged in 
alternating clusters of positive and negative charges. 
Moreover, 9 of the 10 consensus sequences required for 
phosphosphorylation by protein kinase A (PKA) , and, 7 of 
the potential substrate sites for protein kinase c (PRC) 
found in CFTR are located in this exon. 
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±sJL fPKCTXQH P? CTO 

Properties of CFTR can be derived from comparison to 
other membrane-associated proteins (Figure 15) . In 
addition to the overall structural similarity with the 
5 mammalian P-glycoprotein, each of the two predicted 

domains in CFTR also shows remarkable resemblance to the 
single domain structure of hemolysin B of Si coli and the 
product of the White gene of Drosophila. These latter 
proteins are involved in the transport of the lytic 

10 peptide of the hemolysin system and of eye pigment 
molecules, respectively. The vitamin B12 transport 
system of coli . BtuD and MbpX which is a liverwort 
chloroplast gene whose function is unknown also have a 
similar structural motif. Furthermore , the CFTR protein 

15 shares structural similarity with several of the 

periplasmic solute transport systems of gram negative 
bacteria where the transmembrane region and the ATP- 
binding folds are contained in separate proteins which 
function in concert with a third substrate-binding 

20 polypeptide. 

The overall structural arrangement of the 
transmembrane domains in CFTR is similar to several 
cation channel proteins and some cation-translocating 
ATPases as well as the recently described adenylate 

25 cyclase of bovine brain. The functional significance of 
this topological classification, consisting of 6 
transmembrane domains, remains speculative. 

Short regions of sequence identity have also been 
detected between the putative transmembrane regions of 

30 CFTR and other membrane-spanning proteins. 

Interestingly, there are also sequences, 18 amino acids 
in length situated approximately 50 residues from the - 
carboxyl terminus of CFTR and the raf serine/threonine 
kinase protooncogene of Xenoous laevis which are 

35 identical at 12 of these positions. 

Finally, an amino acid sequence identity (10/13 
conserved residues) has been noted between a hydrophilic 
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segment (position 701-713) within the highly charged R- 
doxnain of CFTR and a region immediately preceding the 
first transmembrane loop of the sodium channels in both 
rat brain and eel. The charged R-domain of CFTR is not 
5 shared with the topologically closely related P- 
glycoprotein? the 241 amino acid linking-peptide is 
apparently the major difference between the two proteins . 

In summary, features of the primary structure of the 
CFTR protein indicate its possession of properties 
10 suitable to participation in the regulation and control 
of ion transport in the epithelial cells of tissues 
affected in CF. Secure attachment to the membrane in two 
regions serve to position its three major intracellular 
domains (nucleotide-binding folds 1 and 2 and the R- 
domain) near the cytoplasmic surface of the cell membrane 
where they can modulate ion movement through channels 
formed either by CFTR transmembrane segments themselves 
or by other membrane proteins. 

In view of the genetic data, the tissue-specificity, 
and the predicted properties of the CFTR protein, it is 
reasonable to conclude that CFTR is directly responsible 
for CF. it, however, remains unclear how CFTR is 
involved in the regulation of ion conductance across the 
apical membrane of epithelial cells. 
25 it is possible that CFTR serves as an ion channel 

itself. As depicted in Figure 13, 10 of the 12 
transmembrane regions contain one or more amino acids 
with charged side chains, a property similar to the brain 
sodium channel and the GABA receptor chloride channel 
30 subunits, where charged residues are present in 4 of the 
6, and 3 of the 4, respective membrane-associated domains 
per subunit or repeat unit. The amphipathic nature of 
these transmembrane segments is believed to contribute to 
the channel-forming capacity of these molecules. 
Alternatively, CFTR may not be an ion channel but instead 
serve to regulate ion channel activities, in support of 
the latter assumption, none of the purified polypeptides 
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from trachea and kidney that are capable of 
reconstituting chloride channels in lipid membranes 
[Landry et al, Science 224:1469 (1989)] appear to be CFTR 
if judged on the basis of the molecular mass. 
5 In either case, the presence of ATP-binding domains 

in CFTR suggests that ATP hydrolysis is directly involved 
and required for the transport function. The high 
density of phosphorylation sites for PKA and PKC and the 
clusters of charged residues in the R-domain may both 

10 serve to regulate this activity. The deletion of a 
phenylalanine residue in the NBF may prevent proper 
binding of ATP or the conformational change which this 
normally elicits and consequently result in the observed 
insensitivity to activation by PKA- or PKOmediated 

15 phosphorylation of the CF apical chloride conductance 
pathway. Since the predicted protein contains several 
domains and belongs to a family of proteins which 
frequently function as parts of multi-component molecular 
systems , CFTR may also participate in epithelial tissue 

20 functions of activity or regulation not related to ion 
transport. 

With the isolated CF gene (cDNA) now in hand it is 
possible to define the basic biochemical defect in CF and 
to further elucidate the control of ion transport 

25 pathways in epithelial cells in general. Most important, 
knowledge gained thus far from the predicted structure of 
CFTR together with the additional information from 
studies of the protein itself provide a basis for the 
development of improved means of treatment of the 

30 disease. In such studies, antibodies have been raised to 
the CFTR protein as later described. 
$•0 CF SCREENING 
S.l DNA BASED DIAGNOSIS 

Given the knowledge of the 85, 148, 178, 455, 493, 

35 507, 542, 549, 551, 560, 563, 574, 1077 and 1092 amino 
acid position mutations and the nucleotide sequence 
varients at DNA sequence positions 129, 556, 621+1, 
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711+1, 1717-1 and 3659 as disclosed herein, carrier 
screening and prenatal diagnosis can be carried out as 
follows. 

The high risk population for cystic fibrosis is 
5 Caucasians. For example, each Caucasian woman and/or man 
of child-bearing age would be screened to determine if 
she or he was a carrier (approximately a 5% probability 
for each individual), if both are carriers, they are a 
couple at risk for a cystic fibrosis child. Each child 
10 of the at risk couple has a 25% chance of being affected 
with cystic fibrosis. The procedure for determining 
carrier status using the probes disclosed herein is as 
follows. 

For purposes of brevity, the discussion on screening 

15 by use of one of the selected mutations is directed to 
the 1507 mutation. It is understood that screening can 
also be accomplished using one of the other mutations or 
using several of the mutations in a screening process or 
mutation detection process of this section on CF 

20 screening involving DNA diagnosis and mutation detection. 
One major application of the DNA sequence 
information of the normal and 507 mutant CF gene is in 
the area of genetic testing, carrier detection and 
prenatal diagnosis. Individuals carrying mutations in 

25 the CF gene (disease carrier or patients) may be detected 
at the DNA level with the use of a variety of techniques. 
The genomic DNA used for the diagnosis may be obtained 
from body cells, such as those present in peripheral 
blood, urine, saliva, tissue biopsy, surgical specimen 

30 and autopsy material. The DNA may be used directly for 
detection of specific sequence or may be amplified 
enzymatically in yjtro by using PCR [Saiki et al. Science 
230: 1350-1353, (1985), Saiki et al. Nature 324: 163-166 
(1986) ] prior to analysis, rna or its cDNA form may also 

35 be used for the same purpose. Recent reviews of this 
subject have been presented by Caskey, r Science 236: 
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1223-8 (1989) and by Landegren et al ( Science 242: 229- 
237 (1989)]. 

The detection of specific DNA sequence may be 
achieved by methods such as hybridization using specific 
5 oligonucleotides [Wallace et al. Cold Spring Harbour 
Svmp. Quant. Biol. 51: 257-261 (1986)], direct DNA 
sequencing [Church and Gilbert, Proc. Nat. Acad. Sci. U. 
S. A. 81: 1991-1995 (1988) ], the use of restriction 
enzymes [Flavell et al. Cell 15: 25 (1978), Geever et al 

10 prpfft Nat. Ac?fl t gc4t ? f S. ft. 78: 5081 (1981) ], 

discrimination on the basis of electrophoretic mobility 
in gels with denaturing reagent (Myers and Maniatis, Cold 
Soring Harbour Svm. Quant. Biol. 51: 275-284 (1986) ), 
RNase protection (Myers, R. M. , Larin, J., and T. 

15 Maniatis Science 230: 1242 (1985)), chemical cleavage 
(Cotton et al Proc. Nat. Acad. Sci. U. S. A. 85: 4397- 
4401, (1985)) and the ligase-mediated detection procedure 
[Landegren et al Science 241:1077 (1988)]. 

Oligonucleotides specific to normal or mutant 

20 sequences are chemically synthesized using commercially 
available machines, labelled radioactively with isotopes 
(such as 32 P) or non-radioactively (with tags such as 
biotin (Ward and Langer et al. Proc. Nat. Acad. Sci. U. 
S. A. 78: 6633-6657 (1981)), and hybridized to individual 

25 DNA samples immobilized on membranes or other solid 
supports by dot-blot or transfer from gels after 
electrophoresis. The presence or absence of these 
specific sequences are visualized by methods such as 
autoradiography or fluorometric (Landegren et al, 1989, 

30 supra ) or color imetric reactions (Gebeyehu et a. Nucleic 
Acids Research 15: 4513-4534 (1987)). An embodiment of 
this oligonucleotide screening method has been applied in 
the detection of the 1507 deletion as described herein. 
Sequence differences between normal and mutants may 

35 be revealed by the direct DNA sequencing method of Church 
and Gilbert ( supra ) . Cloned DNA segments may be used as 
probes to detect specific DNA segments. The sensitivity 
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migration pattern of DNA heteroduplexes in non-denaturing 
gel electrophoresis, as have been detected for the 3 bp 
(1507) mutation and in other experimental systems 
[Nagamine et al, Am. J. Hum, Genet . 45:337-339 (1989)]. 
5 Alternatively, a method of detecting a mutation 

comprising a single base substitution or other small 
change could be based on differential primer length in a 
PGR. For example, one invariant primer could be used in 
addition to a primer specific for a mutation. The PGR 

10 products of the normal and mutant genes can then be 
differentially detected in acrylamide gels. 

Sequence changes at specific locations may also be 
revealed by nuclease protection assays, such as RNase 
(Myers, supra) and SI protection (Berk, A. J. , and P. A. 

15 Sharpe pr<re T pat, ftgaflt s<?j y Vt ?- Ar 75: 1274 (1978)), 
the chemical cleavage method (Cotton, supra ) or the 
ligase-mediated detection procedure (Landegren supra ) . 

In addition to conventional gel-electrophoresis and 
blot-hybridization methods, DNA fragments may also be 

20 visualized by methods where the individual DNA samples 
are not immobilized on membranes. The probe and target 
sequences may be both in solution or the probe sequence 
may be immobilized [Saiki et al, Proc. Natl. Acad, sci 
USA, 86:6230-6234 (1989)]. A variety of detection 

25 methods, such as autoradiography involving radioisotopes, 
direct detection of radioactive decay (in the presence or 
absence of scintillant) , spectrophotometry involving 
colorigenic reactions and f luorometry involving 
fluorogenic reactions, may be used to identify specific 

30 individual genotypes. 

Since more than one mutation is anticipated in the 
CF gene such as 1507 and F508, a multiples system is an 
ideal protocol for screening CF carriers and detection of 
specific mutations. For example, a PCR with multiple, 

35 specific oligonucleotide primers and hybridization 

probes, may be used to identify all possible mutations at 
the same time (Chamberlain et al. Nucleic Acids Research 
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16: 1141-1155 (1988)). The procedure may involve 
immobilized sequence-specific oligonucleotides probes 
(Saiki et al, supra l . 
EjJL DETECTING THE CP 507 MOTM'TOW 
5 These detection methods may be applied to prenatal 

diagnosis using amniotic fluid cells, chorionic villi 
biopsy or sorting fetal cells from maternal circulation. 
The test for CF carriers in the population may be 
incorporated as an essential component in a broad-scale 

10 genetic testing program for common diseases. 

According to an embodiment of the invention, the 
portion of the DNA segment that is informative for a 
mutation, such as the mutation according to this 
embodiment, that is, the portion that immediately 

15 surrounds the 1507 deletion, can then be amplified by 
using standard PCR techniques [as reviewed in Landegren, 
Ulf, Robert Kaiser, c. Thomas Caskey, and Leroy Hood, DNA 
Diagnostics - Molecular Techniques and Automation, in 
Science 242: 229-237 (1988)). it is contemplated that 

20 the portion of the DNA segment which is used may be a 
single DNA segment or a mixture of different DNA 
segments. A detailed description of this technique now 
follows. 

A specific region of genomic DNA from the person or 
25 fetus is to be screened. Such specific region is defined 
by the oligonucleotide primers C16B 
( 5 ' GTTTTCCTG6ATTATGCCTGGCAC3 ' ) and C16D 
( 5 * GTTGGCATGCTTTGATGACGCTTC3 ' ) or as shown in Figure 18 
by primers ioi-5 and 10i-3. The specific regions using 
30 10i-5 and 101-3 were amplified by the polymerase chain 
reaction (PCR). 200-400 ng of genomic DNA, from either 
cultured lymphoblasts or peripheral blood samples of CF 
individuals and their parents, were used in each PCR with 
the oligonucleotides primers indicated above. The 
35 . oligonucleotides were purified with Oligonucleotide 

Purification Cartridges 7 " (Applied Biosystems) or NENSORB™ 
PREP columns (Dupont) with procedures recommended by the 
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suppliers. The primers were annealed at 55*c for 30 sec, 
extended at 72*C for 60 sec (with 2 units of Tag DMA 
polymerase) and denatured at 94*C for 60 sec, for 30 
cycles with a final cycle of 7 min for extension in a 
5 Perkin-Elmer/Cetus automatic thermocycler with a Step- 
Cycle program (transition setting at 1.5 min). Portions 
of the PCR products were separated by electrophoresis on 
1.4% agarose gels, transferred to Zetabind"; (Biorad) 
membrane according to standard procedures. 
10 The normal and AI507 oligonucleotide probes of 

Figure 19 (io ng each) are labeled separately with 10 
units of T4 polynucleotide kinase (Pharmacia) in a 10 fil 
reaction containing 50 mM Tris-HCl ( P H7.6), io mM Mgci 2 , 
0.5 mM dithiothreitol, 10 mM spermidine, l mM EDTA and 
15 30-40 MCi of yt 32 ?) - ATP for 20-30 min at 37«c. The 
unincorporated radionucleotides were removed with a 
Sephadex G-25 column before use. The hybridization 
conditions were as described previously (J.m. Rommens et 
al Afflt Jt Hum, genet . 43,645 (1988)} except that the 
20 temperature can be 37*c. The membranes are washed twice 
at room temperature with 5xSSC and twice at 39 «C with 2 x 
SSC (l x SSC = 150 mM HaCl and 15 mM Ha citrate) . 
Autoradiography is performed at room temperature 
overnight. Autoradiographs are developed to show the 
hybridization results of genomic DNA with the 2 specific 
oligonucleotide probes. Probe C normal detects the 
normal DNA sequence and Probe C AI507 detects the mutant 
sequence. 

Genomic DNA sample from each family member can, as 
explained, be amplified by the polymerase chain reaction 
using the intron sequences of Figure 18 and the products 
separated by electrophoresis on a 1.4% agarose gel and 
then transferred to Zetabind (Biorad) membrane according 
to standard procedures. The 3bp deletion of AI507 can be 
revealed by a very convenient polyacrylamide gel 
electrophoresis procedure. When the PCR products 
generated by the above-mentioned 10i-5 and l0i-3 primers 
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are applied to an 5% polyacrylamide gel, electrophoresed 
for 3 hrs at 20V/cm in a 90nH Tris-borate buffer ( P h 
8.3), DNA fragments of a different mobility are clearly 
detectable for individuals without the 3 bp deletion, 
5 heterozygous or homozygous for the deletion. 

As already explained with respect to Figure 20, the 
PCR amplified genomic DNA can be subjected to gel 
electrophoresis to identify the 3 bp deletion. As shown 
in Figure 20, in the four lanes the first lane is a 
10 control with a normal/AP508 deletion. The next lane is 
the father with a normal/AI507 deletion. The third lane 
is the mother with a normal/AP508 deletion and the fourth 
lane is the child with a AF508/AI507 deletion. The 
homoduplexes show up as solid bands across the base of 
15 each lane. In lanes 1 and 3, the two heteroduplexes show 
up very clealy as two spaced apart bands. In lane 2, the 
father's AI507 mutation shows up very clearly, whereas in 
the fourth lane, the child with the adjacent 507, 508 
mutations, there is no distinguishable heteroduplexes. 
20 Hence the showing is at the homoduplex line, since the 
father in lane 2 and the mother in lane 3 show 
heteroduplex banding and the child does not, indicates 
either the child is normal or is a patient. This can be 
futher checked if needed, such as in embryoic analysis by 
mixing the 507 and 508 probes to determine the presence 
of the AI507 and AF508 mutations. 

Similar alteration in gel mobility for 
heteroduplexes formed during PCR has also been reported 
for experimental systems where small deletions are 
involved (Magamine et al sjipja) . These mobility shifts 
may be used in general as the basis for the non- 
radioactive genetic screening tests. 
5*2 CP BCR«emna PROflg ftWff 

It is appreciated that approximately i% of the 
carriers can be detected using the specific AI507 probes 
of this particular embodiment of the invention. Thus, if 
an individual tested is not a carrier using the AI507 
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probes, their carrier status can not be excluded, they 
may carry some other mutation, such as the AF508 as 
previously noted. However, if both the individual and 
the spouse of the individual tested are a carrier for the 
5 AI507 mutation, it can be stated with certainty that they 
are an at risk couple. The sequence of the gene as 
disclosed herein is an essential prerequisite for the 
determination of the other mutations. 

Prenatal diagnosis is a logical extension of carrier 

10 screening. A couple can be identified as at risk for 
having a cystic fibrosis child in one of two ways: if 
they already have a cystic fibrosis child, they are both, 
by definition, obligate carriers of the defective CFTR 
gene, and each subsequent child has a 25% chance of being 

15 affected with cystic fibrosis. A major advantage of the 
present invention eliminates the need for family pedigree 
analysis, whereas, according to this invention, a gene 
mutation screening program as outlined above or other 
similar method can be used to identify a genetic mutation 

20 that leads to a protein with altered function. This is 
not dependent on prior ascertainment of the family 
through an affected child. Fetal DNA samples, for 
example, can be obtained, as previously mentioned, from 
amniotic fluid cells and chorionic villi specimens. 

25 Amplification by standard PCR techniques can then be 
performed on this template DNA. 

If both parents are shown to be carriers with the 
AI507 deletion, the interpretation of the results would 
be the following. If there is hybridization of the fetal 

30 DNA to the normal probe, the fetus will not be affected 
with cystic fibrosis, although it may be a CF carrier 
(50% probability for each fetus of an at risk couple) . If 
the fetal DNA hybridizes only to the AI507 deletion probe 
and not to the normal probe, the fetus will be affected 

35 with cystic fibrosis. 

It is appreciated that for this and other mutations 
in the CF gene, a range of different specific procedures 
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can be used to provide a complete diagnosis for all 
potential CF carriers or patients, a complete 
description of these procedures is later described 

The invention therefore provides a method and kit 
5 for determining if a subject is a CP carrier or CF 

stepTtf • ^ SUamary ' SCreening oethod uprises the 

providing a biological sample of the subject to be 
screened; and providing an assay for detecting in the 
10 biological sample, the presence of at least a member from 
the group consisting of a 507 mutant CF gene, 507 mutant 
CF gene products and mixtures thereof. 

at l2l tteth ° d My ^ fUrther ch *"cterized * including 
15 nl ^ nUCle ° tide P" be "»ich is a different 

15 DNA sequence fragment of, for example, the DNA of Figure 
l, or a different DNA sequence fragment of human 
chromosome 7 and located to either side of the DNA 
sequence of Figure l. In this respect, the DNA fragments 
of the intron portions of Figure 2 are useful in further 

oTtl" Pre8enCe ° f ^ nUtati0n ' ^easpls 

of the introns at the exon boundaries may be relied upon 

of 3 ™* Bn l"l PrOC ** Ur * S t0 fUrther COnfir » presence 
of the mutation at the 1507 position or othe mutant 
positions. 

25 suitablel:/""? 1119 ^ " eBb0dlfflent <* ^e invention, 
suitable for use in the screening technique and for 

assaying for the presence of the mutant CF gene by an 
immunoassay comprises: 

(a) an antibody which specifically binds to a gene 
Z TJl ^ r ant CP ~" ^ * - one of 

Tl Z T* **' 1?8 ' 455 ' 4 "' 507 ' 549, 

551, 560, 563, 574, 1077 and 1092; 

BntJV / eagent neans for Meeting the binding of the 
antibody to the gene product; and 

35 (c) the antibody and reagent means each being 

Present in amounts effective to perform the immunoassay. 
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The kit for ..saying for t,,. pMMnce 

5 binds 'to t r °" g0nueleot " e W which specifically 
bind, to the mutant cf gene having a , t \ 

the positions 85. I48 , 178 , 455, 493, S „7. 542 54T 55! 
560, 553, 574, 1077 and 1092, ' 5M ' 

of rM,ent n " ns £ « Meeting th. hybridation 

10 ?c " i r UClTOtld "' «« th. mutant cp , ma , and 

« (O th. probe and reagent ..an. each being present 

L^L\ ffective to perforn tte ^«»tion 

^ MITIWBIRW To detect mn 

.«tan™ ti0nea ; < " ,ti ' ,odlM to ^"opes within the 
mutant CPTR protein at portions es, l 48 17s 45s 

" HI™ .*"< «. X.«lo 10, "re 4 "' 

raised to provide .xt.».iv. information .n the 
characteristic of th. mutant protein and other valuable 
information which includes! valuable 

1- The antibodies can be used to provide another 
technique in detecting any of the other CP stations 

T^ZIV the ™° « ■ — — - 

2. Antibodies to distinct domains of th. mutant 

5 ^ < " ed t0 dete ™ l ~ th. topologies! 

arrangement of th. protein i» the cell membrane. 
This provide, information on segment, of the protein 
which are sensible to eternally added moduuttg 
agents for purposes of drug therapy. 
3- The structure-function relationships of 
portions of th. protein can be e»u,in.d „.in, 
apecific antlbodi... For example, it i. possible to 
introduce into ells antibodies recognising eaS cf 
the charged cytoplasmic loops which Join the 
transmembrane sequences as well as portions of the 
nucleotide binding folds a*, the R-domain. tL 
influence of these antibodies on functional 
parameter, of the protein provide insight into cell 
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15 



regulatory mechanisms and potentially suggest means 
of modulating the activity of the defective protein 
in a CF patient. 

4. Antibodies with the appropriate avidity also 
5 enable immunoprecipitation and immuno-aff inity 

purification of the protein. Immunoprecipitation 
will facilitate characterization of synthesis and 
post translational modification including ATP 
binding and phosphorylation. Purification will be 
10 required for studies of protein structure and for 

reconstitution of its function, as well as protein 
based . therapy. 

In order to prepare the antibodies, fusion proteins 
containing defined portions of anyone of the mutant CFTR 
polypeptides can be synthesized in bacteria by expression 
of corresponding mutant DNA sequence in a suitable 
cloning vehicle, smaller peptide may be synthesized 
chemically. The fusion proteins can be purified, for 
example, by affinity chromatography on glutathione- 
agarose and the peptides coupled to a carrier protein 
(hemocyanin), mixed with Freund's adjuvant and injected 
into rabbits. Following booster injections at bi-weekly 
intervals, the rabbits are bled and sera isolated. The 
developed polyclonal antibodies in the sera may then be 
25 combined with the fusion proteins. Immunoblots are then 
formed by staining with, for example, alkaline- 
phosphatase conjugated second antibody in accordance with 
the procedure of Blake et al, Anal. m» ff h« r , 136 . 175 
(1984) . 

Thus, it is possible to raise polyclonal antibodies 
specific for both fusion proteins containing portions of 
the mutant CFTR protein and peptides corresponding to 
short segments of its sequence, similarly, mice can be 
injected with KLH conjugates of peptides to initiate the 
production of monoclonal antibodies to corresponding 
segments of mutant CFTR protein. 
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As for the generation of monoclonal antibodies, 
xmmunogens for the raising of monoclonal antibodies 
(mAbs, to the mutant CFTR protein are bacterial fusion 
protexns fSmith et al. 67:31 (1988) } 

. portions of the CFTR polypeptide or synthetic peptides 
corresponding to short (12 to 25 amino acids in length, 
segments of the mutant sequence. The essential 
methodology is that of Kohler and Milstein Cfiatom, 256- 

495 (1975)]. U-A£U££ 

Balb/c mice are immunized by intraperitoneal 
injection with 500 ng of Dure fueinn * 
tu>t>h*~ * < fusion protein or synthetic 

peptide in incomplete Freund's adjuvant, a second 
paction is given after 14 days, a third after 21 days 
and a fourth after 28 days. Individual animals so 
immunized are sacrificed one, two and four weeks 
following the final injection. Spleens are removed, 
their ceu 8 dissociated, collected and fused with Sp2/o- 
Agl4 myeloma cells according to Gefter et al, a^aiL 
^f 90 ^ 31231 («">• The fusion mixture^ 
distributed in culture medium selective for the 
propagation of fused cells which are grown until they are 
about 25* confluent. At this time, culture supJatlL 

pa'rJcutr ^J™""" * with a 

particular CPTR antigen. An alkaline phosphatase 

labelled anti-mouse second antibody is then used for 

vets are '1 P ° 8ltiVee * ^ f ™ culture 
coll JZ r in CUltUre ' tteir ^natants 

frozeTi and the ^ored deep 

frozen in cryoprotectant-containing medium. To obtain 
large quantities of a mAb, producer cells are injected 
into the peritoneum at 5 x 10« cells per animal, and 
ascites fluid is obtained. Purification is by 
chromotography on Protein G- or Protein A-agarose 
according to Ey et al, Iwmno^^ 15:4 29 (1977, 

Reactivity of these mAbs with the mutant CFTR 
protexn can be confirmed by polyacrylamide gel 
electrophoresis of membranes isolated from epithelial 
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cells in which it is expressed and immunoblotted [Tovbin 
et al, Proc. Natl. Acad. Sci. PSA 76:4350 (1979)]. 

In addition to the use of monoclonal antibodies 
specific for the particular mutant domain of the CFTR 
5 protein to probe their individual functions, other mAbs, 
which can distinguish between the normal and mutant forms 
of CFTR protein, are used to detect the mutant protein in 
epithelial cell samples obtained from patients, such as 
nasal mucosa biopsy "brushings" [ R. De-Lough and J. 

10 Rutland, J. Clin. Pathol. 42, 613 (1989)] or Skin biopsy 
specimens containing sweat glands. 

Antibodies capable of this distinction are obtained 
by differentially screening hybridomas from paired sets 
of mice immunized with a peptide containing, for example, 

15 the isoleucine at amino acid position 507 (e.g. 

GTIKENIIFGVSY) or a peptide which is identical except for 
the absence of 1507 ( GTIKENIFGVSY ) . mAbs capable of 
recognizing the other mutant forms of CFTR protein 
present in patients in addition or instead of 1507 

20 deletion are obtained using similar monoclonal antibody 
production strategies. 

Antibodies to normal and CF versions of CFTR protein 
and of segments thereof are used in diagnostically 
immunocytochemical and immunofluorescence light 

25 microscopy and immunoelectron microscopy to demonstrate 
the tissue, cellular and subcellular distribution of CFTR 
within the organs of CF patients, carriers and non-CF 
individuals. 

Antibodies are used to therapeutically modulate by 
30 promoting the activity of the CFTR protein in CF patients 
and in cells of CF patients. Possible modes of such 
modulation might involve stimulation due to cross-linking 
of CFTR protein molecules with multivalent antibodies in 
analogy with stimulation of some cell surface membrane 
35 receptors, such as the insulin receptor [O'Brien et al, 
Euro, Mol. Biol. Organ. J. 6:4003 (1987)], epidermal 
growth factor receptor [Schreiber et al, J. Biol, chem. 
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258:846 (1983)) and T-cell receptor-associated molecules 
such as CD4 CVeillette et al Nature., 338:257 (1989)]. 

Antibodies are used to direct the delivery of 
therapeutic agents to the cells which express defective 
CFTR protein in CP. For this purpose, the antibodies are 
incorporated into a vehicle such as a liposome [Matthay 
et al, ganger Res . 46:4904 (1986)] which carries the 
therapeutic agent such as a drug or the normal gene. 

DNA diagnosis is currently being used to assess 
whether a fetus will be bom with cystic fibrosis, but 
historically this has only been done after a particular 
set of parents has already had one cystic fibrosis child 
which identifies them as obligate carriers. However, in 
combination with carrier detection as outlined above,' DNA 
diagnosis for all pregnancies of carrier couples will be 
possible, if the parents have already had a cystic 
fibrosis child, an extended haplotype analysis can be 
done on the fetus and thus the percentage of false 
positive or false negative will be greatly reduced, if 
the parents have not already had an affected child and 
the DNA diagnosis on the fetus is being performed on the 
basis of carrier detection, haplotype analysis can still 
be performed. 

Although it has been thought for many years that 
there is a great deal of clinical heterogeneity in the 
cystic fibrosis disease, it is now emerging that there 
are two general categories, called pancreatic sufficiency 
(CF-PS) and pancreatic insufficiency (CF-Pi). if the 
mutations related to these disease categories are well 
characterized, one can associate a particular mutation 
with a clinical phenotype of the disease. This allows 
changes in the treatment of each patient. Thus the 
nature of the mutation will to a certain extent predict 
the prognosis of the patient and indicate a specific 
treatment. 
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lift MOLECULAR BIOLOflV n f ^ySTTC yr^^ fljp 

The postulate that CFTR may regulate the activity of 
ion channels, particularly the outwardly rectifying CI 
channel implicated as the functional defect in CP, can be 
tested by the injection and translation of full length in 
YitES transcribed CFTR mRNA in Xenopus oocytes. The 
ensuing changes in ion currents across the oocyte 
membrane can be measured as the potential is clamped at a 
fixed value. CPTR may regulate endogenous oocyte 
channels or it may be necessary to also introduce 
epithelial cell rna to direct the translation of channel 
proteins. Use of mRNA coding for normal and for mutant 
CFTR, as provided by this invention, makes these 
experiments possible. 

Other modes of expression in heterologous cell 
system also facilitate dissection of structure-function 
relationships. The complete CFTR DNA sequence ligated 
into a plasmid expression vector is used to transfect 
cells so that its influence on ion transport can be 
20 assessed. Plasmid expression vectors containing part of 
the normal CFTR sequence along with portions of modified 
sequence at selected sites can be used in vitro 
mutagenesis experiments performed in order to identify 
those portions of the cftr protein which are crucial for 
25 regulatory function. 

EXPRBBSIOB or wre ""TnTTT oka 8«n^ 
The mutant DNA sequence can be manipulated in 
studies to understand the expression of the gene and its 
product, and, to achieve production of large quantities 
30 of the protein for functional analysis, antibody 

production, and patient therapy. The changes in the 
sequence may or may not alter the expression pattern in 
terms of relative quantities, tissue-specificity and 
functional properties. The partial or full-length cDNA 
35 sequences, which encode for the subject protein, 
unmodified or modified, may be ligated to bacterial 
expression vectors such as the pRiT (Nilsson et al. esq 
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i. 4: 1075-1080 (1985)), pGEX (Smith and Johnson, Gene 
67: 31-40 (1988)) or pATH (Spindler et al. J. Virol. 49 : 
132-141 (1984) ) plasmids which can be introduced into £. 
S2li cells for production of the corresponding proteins 
5 which may be isolated in accordance with the previously 
discussed protein purification procedures. The DNA 
sequence can also be transferred from its existing 
context to other cloning vehicles, such as other 
plasmids, bacteriophages, cosmids, animal virus, yeast 

10 artificial chromosomes (YAC) (Burke et al. Science 236: 
806-812, (1987)), somatic cells, and other simple or 
complex organisms, such as bacteria, fungi (Timber lake 
and Marshall, Science 244: 1313-1317 (1989), 
invertebrates, plants (Gasser and Fraley, Science 244: 

15 1293 (1989), and pigs (Pursel et al. Science 244: 1281- 
1288 (1989)). 

For expression in mammalian cells, the cDNA sequence 
may be ligated to heterologous promoters, such as the 
simian virus (SV) 40, promoter in the psV2 vector 

20 [Mulligan and Berg, Proc. Natl. Acad. s«< jipn. 78:2072- 
2076 (1981) ] and introduced into cells, such as monkey 
COS-1 cells [Gluzman, fiell, 23:175-182 (1981)], to 
achieve transient or long-term expression. The stable 
integration of the chimeric gene construct may be 

25 maintained in mammalian ceils by biochemical selection, 
such as neomycin (Southern and Berg, J. Moi. A»»in 
Seneij. 1:327-341 (1982)] and mycophoenolic acid [Mulligan 
and Berg, supra, ]. 

DMA sequences can be manipulated with standard 
procedures such as restriction enzyme digestion, fill-in 
with DNA polymerase, deletion by exonuclease, extension 
by terminal deoxynucleotide transferase, ligation of 
synthetic or cloned DNA sequences, site-directed 
sequence-alteration via single-stranded bacteriophage 
intermediate or with the use of specific oligonucleotides 
in combination with PCR. 



30 



35 



WO 91/10734 PCT/CA91/0O0O9 

107 

The cDNA s»,ue„c. (or portions derived from it. or 
« -*u gene (a cDHA with an intron and its own proper, 
is introduced into eufcaryotic .xprea-ion vector! by 
conventional techniques. Th„. vector, are aligned to 
5 permit th. transcription of the CDHA in euKaryotlc cells 
by Providing regulatory serene., that initiaTand ' 
anhanM tha tr«»cri P tion of the cDNA and ensure". 
Proper splicing and pollution, veotor. contain, 

10 7 ,nhanMr m ° « «- viruT 

10 (SV>40 or long t^m,! repMt , Qf ^ 

virus and pdyad.nyl.tion and spiicin, si,nal fromT^ 
are readily available CMulllgan et al Proc^Hatl ,!,„ 
"A 7 8! i. 78 - 2 o 7 «. ,i, 8l); Co™.; 

1= CFTR endogenous promoter may be used. The lev.! of 

^victor" "IT' CDKi °" n -ith this type 

°' r""' « lth " * usi "9 promoter, that have different 
activlti.. (tor example, the baculovirus pAC3, 3 can 
express com at hloh level. l« « , ■»'•>=•" 

20 d. summer. a . -J*!! * ^alststa eeUs t M. 

«wi ana o. e. smith in. Genetically Altered 
Viruses and th. Environment (B. Fields. .t .1, eds , vol 

Sn^— ^^^ient 

« in addition, some vector, contain sellable aatkm 

IT" 981 tHum * M « -Pra, or m ° 
tsouthern and B«g J, Hoi. ft Pn i n flnnrt 1:3 „. 341 

b.ct.ri.1 that p^it isolation o, cells, by " 

chemical "lection, that have .table. Ion, term 

« egression of the vectors (and therefor, the cdha. in th. 

r.cipx.nt cell. Th . vectors can be maintained in tn 

ells a. episomal , fr.eiy replicating entities by usL g 
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regulatory elements of viruses such as papilloma [Sarver 
et al Mol. Cell BiPlt 1:486 (1981)] or Epstein-Barr 
(Sugden et al Mol. Can , 5!410 (19 85)]. 
Alternatively, one can also produce cell lines that have 
integrated the vector into genomic DNA. Both of these 
types of cell lines produce the gene product on a 
continuous basis. One can also produce cell lines that 
have amplified the number of copies of the vector (and 
therefore of the cDNA as well) to create cell lines that 
can produce high levels of the gene product [Alt et al. 
J. Biol, ehfig- 253: 1357 (1978)]. 

The transfer of dna into eukaryotic, in particular 
human or other mammalian cells is now a conventional 
technique. The vectors are introduced into the recipient 
cells as pure DNA (transfeetion) by, for example, 
precipitation with calcium phosphate [Graham and vander 
Eb, Viroipqy 52:466 (1973) or strontium phosphate [Brash 
et al Molt Ceil Bjpl, 7:2013 (1987)], electroporation 
[Neumann et al HffiQJ 1:841 (1982)], lipofection [Feigner 
et al Pew Wat I t ftcad t ffcj Vfifi 84:7413 (1987)], DEAE 
dextran [Mccuthan et al J. Natl r™~ r TnfTt| 
1968)], microinjection [Mueller et al Cell 15:579 1978)] 
protoplast fusion [Schafner, Proc Natl . ^ , Pr1 rr?y 
72:2163] or pellet guns (Klein et al, fiaJaiES 327: 70 
25 (1987) ] . Alternatively, the cDNA can be introduced by 

infection with virus vectors. Systems are developed that 
use, for example, retroviruses [Bernstein et al. Genetic 
Engineering 7: 235, (1985)], adenoviruses [Ahmad et al i± 
Viral 57:267 (1986)] or Herpes virus [Spaete et al QSll 
30 30:295 (1982)]. 

These eukaryotic expression systems can be used for 
many studies of the mutant CF gene and the mutant CFTR 
product, such as at protein positions 85, 148, 178, 455, 
493, 507, 542, 549, 551, 560, 563, 574, 1077 and 1092. 
These include, for example: (i) determination that the 
gene is properly expressed and that all post- 
radiational modifications necessary for full biological 
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PTOPmrlr °°"> 1 * t « 1 <*) "entity 
r.gul.tory elects located in the 5' region of the CP 

0 T°: n<i »»■ in «- tis^e- or tJporal-^aLn 

s i^LTr on o£ th * cf 9ena <» action 0 r la ^r 

5 a-ounts cf th. normal protein for isolation and 

purification ,4, to use «U. expressing the Cm oro t „, 

cftr protein or an assay system to test th» 

10 protein, specific portion, of the protein, or of 
naturally occurring or artificially produced mutant 
proteins. Naturally occurring mutant proteins exist in 
Patients with CP while artificially product mu^f ** 

alterations. These latter studies «n probe the function 

:iz:* e T* t° acia " sw,M in - ~ r 

mutating the nucleotides coding for that amine acid. 

contamnl T ">« expression vectors 

20 !° ntal " lnS Bttt « t cp °ene «9uenc. or fragments 

ctr% te h — « cells, ™L 
call. fro. other species or „on™iia» ^ 
desired. The choice of cell * B * 
of the treatment. Por examnl. * *«" 

25 levels of the awTr ! ' Pr0dUC8 M9h 

of vectors ccnta^ng^rLXr^r 0 " 

product, since function is not required. si»il« 
tra^t could he perform with Chinese hJter ovary 

fibroblasts or lymphoblasts. 

The recombinant cloning vector, according to «,!. 
invention, then emprises th. selected J^tS Z 

35 r ^.in: r ven : ion * . ^1. 

The DNA is operatively linked in the vector to 
moiecul. so that normal cftr polypeptide can be 
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expressed. The expression control sequence may be 
selected from the group consisting of sequences that 
control the expression of genes of prokaryotic or 
eukaryotic cells and their viruses and combinations 
5 thereof. The expression control sequence may be 

specifically selected from the group consisting of the 
las system, the trj* system, the tas system, the trc 
system, major operator and promoter regions of phage 
lambda, the control region of fd coat protein, the early 

10 and late promoters of SV40, promoters derived from 

polyoma, adenovirus, retrovirus, baculovirus and simian 
virus, the promoter for 3-phosphoglycerate kinase, the 
promoters of yeast acid phosphatase, the promoter of the 
yeast alpha-mating factors and combinations thereof. 

15 The host cell, which may be transfected with the 

vector of this invention, may be selected from the group 
consisting of SSlL, Pgeuflomonap, BasiUlia SilfetUifi, 
fiasillUS stearotheT^pp^^o or other bacili; other 
bacteria; yeast; fungi; insect; mouse or other animal; 

20 or plant hosts; or human tissue cells. 

It is appreciated that for the mutant DNA sequence 
similar systems are employed to express and produce the 
mutant product. 

PROCTER PUHCTIOW eOKqiDERXTTOVp 

25 to study the function of the mutant CFTR protein, it 

is preferable to use epithelial cells as recipients, 
since proper functional expression may require the 
presence of other pathways or gene products that are only 
expressed in such cells, cells that can be used include, 

30 for example, human epithelial cell lines such as T84 

(ATCC #CRL 248) or PANC-1 (ATCC # CLL 1469), or the T43 
immortalized CP nasal epithelium cell line [Jettan et al, 
fiSlSQES (1989)] and primary [Yanhoskes et al. Ann. r«v. 
Besp, pis. 132: 1281 (1985)] or transformed [Scholte et 

>5 al. EXPr CeU t Pfs, 182: 559(1989)] human nasal polyp or 
airways cells, pancreatic cells [Harris and Coleman 
. Cell. sci. . 87: 695 (1987)], or sweat gland cells [Collie 
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et al. In Vitro 21: 597 (1985)] derived from normal or CF 
subjects. The CF cells can be used to test for the 
functional activity of mutant CF genes, current 
functional assays available include the study of the 
5 movement of anions (CI or I) across cell membranes as a 
function of stimulation of cells by agents that raise 
intracellular AMP levels and activate chloride channels 
[Stutto et al. Proc. Nat. Acad, sei . p. s. a. 82: 6677 
(1985)]. Other assays include the measurement of changes 

10 in cellular potentials by patch clamping of whole cells 
or of isolated membranes [Frizzell et al. Science 233: 
558 (1986), Welsch and Liedtke Nature 322: 467 (1986) ]or 
the study of ion fluxes in epithelial sheets of confluent 
cells [Widdicombe et al. Proc. Nat. Acad, tm.i . 82: 6167 

15 (1985) ] . Alternatively, RNA made from the CF gene could 
be injected into Xenppus oocytes. The oocyte will 
translate RNA into protein and allow its study 1 As other 
more specific assays are developed these can also be used 
in the study of transf acted mutant CFTR protein function. 

20 "Domain-switching" experiments between mutant CFTR 

and the human multidrug resistance P-glycoprotein can 
also be performed to further the study of the mutant CFTR 
protein, in these experiments, plasmid expression vectors 
are constructed by routine techniques from fragments of 

25 the mutant CFTR sequence and fragments of the sequence of 
P-glycoprotein ligated together by DNA ligase so that a 
protein containing the respective portions of these two 
proteins will be synthesized by a host cell transfected 
with the plasmid. The latter approach has the advantage 

30 that many experimental parameters associated with 

multidrug resistance can be measured. Hence, it is now 
possible to assess the ability of segments of mutant CFTR 
to influence these parameters. 

These studies of the influence of mutant CFTR on ion 

35 transport will serve to bring the field of epithelial 
transport into the molecular arena. 
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4*2 THERAPIES 

It is Tinder stood that the major aim of the various 
biochemical studies using the compositions of this 
invention is the development of therapies to circumvent 
5 or overcome the CF defect, using both the pharmacological 
and the "gene-therapy" approaches. 

In the pharmacological approach, drugs which 
circumvent or overcome the CF defect are sought. 
Initially, compounds may be tested essentially at random, 
10 and screening systems are required to discriminate among 
many candidate compounds. This invention provides host 
cell systems, expressing various of the mutant CF genes, 
which are particularly well suited for use as first level 
screening systems. Preferably, a cell culture system 
15 using mammalian cells (most preferably human cells) 

transfected with an expression vector comprising a DNA 
sequence coding for CFTR protein containing a CF- 
generating mutation, for example the 1507 deletion, is 
used in the screening process. Candidate drugs are 
20 tested by incubating the cells in the presence of the 
candidate drug and measuring those cellular functions 
dependent on CFTR, especially by measuring ion currents 
where the transmembrane potential is clamped at a fixed 
value. To accommodate the large number of assays, 
25 however, more convenient assays are based, for example, 
on the use of ion-sensitive fluorescent dyes. To detect 
changes in Cl-*on concentration SPQ or its analogues are 
useful. 

Alternatively, a cell-free system could be used. 
30 Purified CFTR could be reconstituted into articifial 
membranes and drugs could be screened in a cell-free 
assay [Al-Aqwatt, Science. (1999)]. 

At the second level, animal testing is required. It 
is possible to develop a model of CF by interfering with 
35 the normal expression of the counterpart of the CF gene 
in an animal such as the mouse. The "knock-out" of this 
gene by introducing a mutant form of it into the germ 
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line of animals will provide a strain of animals with CF- 
like syndromes. This enables testing of drugs which 
showed a promise in the first level cell-based screen 
As further knowledge is gained about the nature of 
5 the protein and its function, it will be possible to 
predict structures of proteins or other compounds that 
interact with the CFTR protein. That in turn will allow 
for certain predictions to be made about potential drugs 
that will interact with this protein and have some effect 
10 on the treatment of the patients. Ultimately such drugs 
may be designed and synthesized chemically on the basis 
of structures predicted to be required to interact with 
domains of cftr. This approach is reviewed in Capsey and 
15 I *ntti«l1lT FTKTinrHM-Pd m— Vu-mj- rrn - 

15 Stockton Press, New York, 1988. These potential drugs 
must also be tested in the screening system. 

Treatment of CP can be performed by replacing the 
defective protein with normal protein, by modulating the 

20 function of the defective protein or by modifying another 
step in the pathway in which CFTR participates in order 
to correct the physiological abnormality. 

To be able to replace the defective protein with the 
normal version, one must have reasonably large amounts of 

25 pure CFTR protein. Pure protein can be obtained as 

described earlier from cultured cell systems. Delivery 
of the protein to the affected airways tissue will 
require its packaging in lipid-containing vesicles that 
facilitate the incorporation of the protein into the cell 

30 membrane, it may also be feasible to use vehicles that 
incorporate proteins such as surfactant protein, such as 
SAP(Val) or SAP(Phe) that performs this function 
naturally, at least for lung alveolar cells. (PCT Patent 
Application WO/8803170, Whitsett et al. May 7, 1988 and 

35 PCT Patent Application WO89/04327, Benson et al, May 18 
1989). The CFTR-containing vesicles are introduced into 
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the airways by Inhalation or irrigation, techniques that 
are currently used in CF treatment (Boat et al f supra ) . 
6.3.2 DRUG THERAPY 

Modulation of CPTR function can be accomplished by 
5 the use of therapeutic agents (drugs) . These can be 

identified by random approaches using a screening program 
in which their effectiveness in modulating the defective 
CFTR protein is monitored In vitro . Screening programs 
can use cultured cell systems in which the defective CFTR 

10 protein is expressed. Alternatively, drugs can be 

designed to modulate CFTR activity from knowledge of the 
structure and function correlations of CFTR protein and 
from knowledge of the specific defect in the CFTR mutant 
protein (Capsey and Delvatte, supra ) . It is possible 

15 that the mutant CFTR protein will require a different 

drug for specific modulation. It will then be necessary 
to identify the specific mutation (s) in each CF patient 
before initiating drug therapy. 

Drugs can be designed to interact with different 

20 aspects of CFTR protein structure or function. For 

example, a drug (or antibody) can bind to a structural 
fold of the protein to correct a defective structure. 
Alternatively, a drug might bind to a specific functional 
residue and increase its affinity for a substrate or 

25 cof actor. Since it is known that members of the class 
of proteins to which CFTR has structural homology can 
interact, bind and transport a variety of drugs, it is 
reasonable to expect that drug-related therapies may be 
effective in treatment of CF. 

30 A third mechanism for enhancing the activity of an 

effective drug would be to modulate the production or the 
stability of CFTR inside the cell. This increase in the 
amount of CFTR could compensate for its defective 
function. 

35 Drug therapy can also be used to compensate for the 

defective CFTR function by interactions with other 
components of the physiological or biochemical pathway 
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necessary for the expression of the CFTR function. These 
interactions can lead to increases or decreases in the 

° f , theSe anClllary P roteins - ^ methods for the 

5 de!£ I Tt° n ^ th6Se ^ W ° Uld *- slnil - to those 
5 described above for CPTR-related drugs. 

in other genetic disorders, it has been possible to 
correct for the consequences of altered or missing normal 
functions by use of dietary modifications. This has 
10 ° f reB ° Val ° £ Betab °"tes, as in the case 

L dZTT ?' ph6 ™ ni "« ^ removed from 

the diet in the first five years of life to prevent 

of n m^L^r ati ° n ' " ^ ^ addlti ° n ° f ***** amounts 
of metabolites to the diet, as in the case of adenosime 

15 1 !? e , deflCienCy Wh0re the ,Unctional correction of 
15 the activity of the enzyme can be produced by the 

addition of the enzyme to the diet. Thus, once the 
details of the CFTR function have been elucidated and the 
basic defect in CP has been defined, therapy may be 
achieved by dietary manipulations. 
20 The second potential therapeutic approach is so- 

called "gene-therapy- in which normal copies of the CF 
gene are introduced in to patients so as to successfully 
code for normal protein in the key epithelial cells of 
affected tissues, it is most crucial to attempt to 
25 achieve this with the airway epithelial cells of the 

c!lL r7 CP ^ 18 deliVe " d *» «»ese 

suff J? ♦ V hiCh " ^ be taken *P «d code for 
ll 'll'T, Pr ° tain to provide regulatory function. As a 
result, the patient's quality and length of life will be 

30 greatly extended. Ultimately, of course, the aim is to 
deliver the gene to all affected tissues. 
£*1*1 2EHE_2ESB&£X 

One approach to therapy of CF is to insert a normal 
version of the cf gene into the airway epithelium of 

35 affected patients, it is important to note that the 

respiratory system is the primary cause of mordibitv and 
mortality in CF; while pancreatic disease is a ma 3 ^ 
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feature, it is relatively well treated today with enzyme 
supplementation. Thus, somatic cell gene therapy (for a 
review, see T. Friedmann, SsiSBSS. 244:1275 (1989)] 
targeting the airway would alleviate the most severe 
5 problems associated with CF. 

A. Retroviral Vectors. Retroviruses have been 
considered the preferred vector for experiments in 
somatic gene therapy, with a high efficiency of infection 
and stable integration and expression [Orkin et al Proa. 
10 Med, Genet 7:130, (1988)]. A possible drawback is that 
cell division is necessary for retroviral integration, so 
that the targeted cells in the airway may have to be 
nudged into the cell cycle prior to retroviral infection, 
perhaps by chemical means. The full length CF gene cDNA 
15 can be cloned into a retroviral vector and driven from 

either its endogenous promoter or from the retroviral LRT 
(long terminal repeat) . Expression of levels of the 
normal protein as low as 10% of the endogenous mutant 
protein in CF patients would be expected to be 
20 beneficial, since this is a recessive disease. Delivery 
of the virus could be accomplished by aerosol or 
instillation into the trachea. 

B * Other V i ral Vectors . Other delivery systems 
which can be utilized include adeno-associated virus 
25 [AAV, McLaughlin et al, J. Virol 62:1963 (1988)], 
vaccinia virus [Hose et al Annu. p« v _ t— , 5s305# 
1987)], bovine papilloma virus [Rasmussen et al, Methods 
£02X021 139:642 (1987)] or member of the herpesvirus 
group such as Epstein-Barr virus (Margolskee et al Mal^ 
30 Cell, Biol 8:2937 (1988) ] . Though much would need to be 
learned about their basic biology, the idea of using a 
viral vector with natural tropism for the respiratory 
track (e.g. respiratory syncytial virus, echovirus, 
coxsackie virus, etc.) is possible. 
35 c * Non-viral Gene T^ngftr Other methods of 

inserting the CF gene into respiratory epithelium may 
also be productive; many of these are lower efficiency 
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and would potentially require infection in 
selection of transfectants, and reimplantation, 'xhis 
would include calcium phosphate, DEAE dextran 
electroporation, and protoplast fusion, a particularly 

ZTslT: ^ ^ ^ ° f -** be 

lZT\tl? rrY ° Ut * ^ C08tr °' Marcel- 
Dekker, 1987]. Synthetic cationic lipids such as dotka 
[Felger et al grog, mi , ftr ^ |Pr1 ^ 84:?413 gg 

»ay increase the efficiency and ease of carrying out this 
10 approach. iS 

1*1 CP AMTWaj, n?r]TI|ff 

W i„ ? S Cr T l0n ° f 8 m ° U8e ° r ° ther ^r CF 

will be crucial to understanding the disease and for 

testing of possible therapies (for general review of 

r^Ta^n 1 T ls \r Erickson ' 

43.582 (1988)]. Currently no animal model of the CF 
exists. The evolutionary conservation of the CF gene (as 

TTZTsT the i Cr ° S8 - SPeCieS ™i,ation blots for 
20 Itl T M iS 8hOWn ln Pi9Ur ° «• i^icate that an 

20 orthologous gene exists in the mouse (hereafter to be 

th^nf!' ^ COrrGSP ° ndin? ** nCFTR) , and 

this will be possible to clone in mouse genomic and cDNA 

that the generation of a specific mutation in the mouse 
25 gene analogous to the 1507 mutation will be most optimum 

Tt Z PT TZ Phen ° tyPe ' th ° U9h ^^te inactivation 
of the mCFTR gene will also be a useful mutant to 
generate. 

A. MMaqenefl^ , . Inactivation of the mCF gene can 
be achieved by chemical fe.g. Johnson et al Pjco^Najo 
A^^^m 78:3138 ( 1981)] or x _ ray nu J^f^ pp 
et ax CT, Pl n U 127:141 (l 979)] of BOU8e ga 

followed by fertilisation. Offspring heterozygous for 
inactivation of mCFTR can then be identified by southern 
blotting t o demonstrate loss of one allele by dosage Z 
failure to inherit one parental allele if an RFLP LrHer 
is being assessed. This approach has previously been 
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successfully used to identify mouse mutants for a-globin 
[Whitney et al Proc, Natl. Acad, , m 77 : 1087 
(1980)], phenylalanine hydroxylase [McDonald et al 
Pediatr. Res 23:63 (1988)], and carbonic anhydrase ix 
5 [Lewis et al Prog, flat] f Acad> ^ f ^ 85:1962f (1988)] . 
B. Transgenics a mutant version of CFTR or mouse 
CFTR can be inserted into the mouse germ line using now 
standard techniques of oocyte injection [Camper, Icsnds 
in genetic (1988)]; alternatively, if it is desirable to 
10 inactivate or replace the endogenous mCF gene, the 

homologous recombination system using embryonic stem (ES) 
cells [Capecchi, Science 244:1288 (1989)] may be applied. 

l. .Oocyte In1ecfr.1on Placing one or more copies 
of the normal or mutant mCF gene at a random location in 
15 the mouse germline can be accomplished by microinjection 
of the pronucleus of a just-fertilized mouse oocyte 
followed by reimplantation into a pseudo-pregnant foster 
mother. The livebom mice can then be screened for 
integrants using analysis of tail DNA for the presence of 
20 human CP gene sequences. The same protocol can be used 
to insert a mutant mCF gene. To generate a mouse model 
one would want to place this transgene in a mouse 
background where the endogenous mCF gene has been 
inactivated, either by mutagenesis (see above ) or by 
25 homologous recombination (see below) . The transgene can 
be either: a) a complete genomic sequence, though the 
size of this (about 250 kb) would require that it be 
injected as a yeast artificial chromosome or a chromosome 
fragment; b) a cDNA with either the natural promoter or a 
heterologous promoter; c) a "minigene" containing all of 
the coding region and various other elements such as 
introns, promoter, and 3' flanking elements found to be 
necessary for optimum expression. 

2 - Retroviral Infection of E ariv 
This alternative involves inserting the CFTR or mCF gene 
into a retroviral vector and directly infecting mouse 
embroyos at early stages of development generating a 
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chimera [Soriano et al Cell 46:19 (1986)]. At least some 
of these will lead to germline transmission. 

3 * Eg cells nnfl Hompiocroim R eeOT nMn,* 1ni The 

embryonic stem cell approach (Capecchi, supra, and 
5 capecchi, Trenflg genet 5:70 (1989)] allows the 
possibility of performing gene transfer and then 
screening the resulting totipotent cells to identify the 
rare homologous recombination events, once identified 
these can be used to generate chimeras by injection of' 
mouse blastocysts, and a proportion of the resulting mice 
will show germline transmission from the recombinant 
line. There are several ways this could be useful in the 
generation of a mouse model for CP: 

a) Inactivation of the mCF gene can be conveniently 
accomplished by designing « dna fragment which contains 
sequences from a mCFTR exon flanking a selectable marker 
such as SSS. Homologous recombination will lead to 
insertion of the nej. sequences in the middle of an exon 
inactivating mCFTR. The homologous recombination events 
20 (usually about 1 in 1000) can be recognized from the 
heterologous ones by DNA analysis of individual clones 
[usually using PCR, Kim et al Nuclei. ^ Pffr 16:8887 
(1988), joyner et al UaS^& 338:153 (1989); zimmer et al 
fflffiCfi, P. 150] or by using a negative selection against 
25 the heterologous events [such as the use of an HSV TK 
gene at the end of the construct, followed by the 
gancyclovir selection, Mansour et al, usS^KS. 336:348 
(19M)]. This inactivated mCFTR mouse can then be used 
to introduce a mutant CP gene or mCP gene containing, for 
30 example, the 1507 abnormality or any other desired 
mutation. 

b) It is possible that specific mutants of mCFTR 
cDNA be created in one step. Por example, one can make a 
construct containing mCP intron 9 sequences at the 5' 
35 end, a selectable ne* gene in the middle, and intro 9 + 
exon 10 (containing the mouse version of the 1507 
mutation) at the 3' end. A homologous recombination 
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event would lead to the insertion of the nsz gene in 
intron 9 and the replacement of exon 10 with the mutant 
version. 

c) If the presence of the selectable nS2 marker in 
the intron altered expresson of the mCF gene, it would be 
possible to excise it in a second homologous 
recombination step. 

d) It is also possible to create mutations in the 
mouse germline by injecting oligonucleotides containing 
the mutation of interest and screening the resulting 
cells by PCR. 

This embodiment of the invention has considered 
primarily a mouse model for cystic fibrosis. Figure 4 
shows cross-species hybridization not only to mouse DNA, 
15 but also to bovine, hamster and chicken DNA. Thus, it is 
contemplated that an orthologous gene will exist in many 
other species also. It is thus contemplated that it will 
be possible to generate other animal models using similar 
technology. 

Although preferred embodiments of the invention have 
been described herein in detail, it will be understood by 
those skilled in the art that variations may be made 
thereto without departing from the spirit of the 
invention or the scope of the appended claims. 
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CLAIMS: 

1. A DNA molecule comprising an intronless DNA sequence 
encoding a mutant CFTR polypeptide having the sequence 
according to Figure l for amino acid residue positions l 
5 to 1480 and, further characterized by nucleotide sequence 
variants resulting in deletion or alteration of amino 
acids of residue positions 85, 148, 178, 455, 493, 507, 
542, 549, 551, 560, 563, 574, 1077 and 1092. 

10 2. A DNA molecule comprising an intronless DNA sequence 
encoding a mutant CFTR polypeptide having the sequence 
according to Figure l for DNA sequence positions l to 
4575 and, further characterized by nucleotide sequence 
variants resulting in deletion or alteration of DNA at 

15 DNA sequence positions 129, 556, 621+1, 711+1, 1717-1 and 
3659. 

3. A DNA molecule comprising an intronless DNA sequence 
selected from the group consisting of: 
20 (»> DNA sequences which correspond to the selected 

sequence of claim l or 2 and which encode, on expression, 
for mutant CFTR polypeptide; 

(b) DNA sequences which correspond to a fragment of 
a selected sequence in claim 1 or 2 including at least 16 

25 nucleotides; 

(c) DNA sequences which comprise at least 16 
nucleotides and encode a fragment of the selected amino 
acid sequence of claim l or 2; and 

(d) DNA sequences encoding an epitope 

30 characteristic of the mutant CFTR protein encoded by at 
least 18 sequential nucleotides in the selected sequence 
of claim 1 or 2. 



35 



4. The DNA molecule of claim 1 or 2 wherein the DNA 
molecule is a cDNA. 
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5. The DNA molecule of claim 3 wherein the DKA molecule 
is a cDNA. 



10 
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6. A purified RMA molecule comprising an rna sequence 
corresponding to the DNA sequence recited in claim 3. 

7. A purified nucleic acid probe comprising a DNA or 
RNA nucleotide sequence corresponding to the selected 
sequence recited in parts (b) , ( C) , or (d, of claim 3. 

8. A nucleic acid probe according to claim 7 wherein 
said sequence comprises AAA GAA AAT ATC TTT GGT GTT, and 
its complement. 

15 9. a recombinant cloning vector comprising the DNA 
molecule of claim 3. 

10. The vector of claim 9 wherein said DNA molecule is 
operatives linked to an expression control sequence in 
said recombinant DNA molecule so that a mutant CPTR 
polypeptide can be expressed, said mutant CPTR 
Polypeptide being selected from the group of CFTH 
Polypeptides at mutant positions 85, 148, 178, 455, 493, 
507, 542, 549, 551, 560, 563, 574, 1077 and 1092, said 
expression control sequence being selected from the group 
consisting of sequences that control the expression of 
genes of prokaryotic or eukaryotic cells and their 
viruses and combinations thereof. 

30 li. The vector of claim 10 wherein said DNA molecule is 
operatives linked to an expression control sequence in 
said recombinant DNA molecule so that a mutant CPTR 
polypeptide can be expressed, said mutant CPTR 
Polypeptide being selected from the group of CPTR 

35 polypeptides at mutant DNA sequence positions 129, 556, 

sel; 7 T' 1717-1 ^ 3659 ' Mid eXp " 88io * ~"trol 
sequence bexng selected from the group consisting of 
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sequences that control the expression of genes of 
prokaryotic or eukaryotic cells and their viruses and 
combinations thereof. 

contrlT VSCt0r °f Clal " " ° r " Kh " rel " -P-ssion 
control se^c is seiected from the group consisting of 

the Ififi system, the tro svstem <-v,«» «.».. ^ 

* SEE system, the £ac system, the trc 

system, major operator and promoter regions of phage 
lambda the control region of fd coat protein, the early 
and late promoters of SV40, promoters derived from 

vSr:H adenOVlrU8 ' retr ° VirUS ' bac ^virus and simian 
virus, the promoter for 3-phosphoglycerate kinase, the 
promoters of yeast acid phosphatase, the promoter of the 
yeast alpha-mating factors and combinations thereof. 

13. A host transformed with the vector according to 
claim 9. ^ 

14. The host of claim 13 selected from the group 
fisting 0 f strains of & ^ ssllt 2^*^, 

*^iS< JfcSillUfi gtearpthermophnMn^ or other^acil^ 
other bacteria; yeast; fungi; insect . Bouse Qr other ' 

animal; plant hosts; or human tissue cells. 
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15 The host of claim 14 wherein said human tissue cells 
are human epithelial cells. 

16. A method for producing a mutant CFTR polypeptide 
comprising the steps of: 

of ( T CUltUring a host cel1 transfected by the vector 
of claim 8 xn a medium and under conditions favorable for 
expression of the mutant CFTR polypeptide selected from 
the group having mutant positions 85, 148, 178, 455 493 
507, 542, 549, 551, 560, 563, 574, 1077 and 1092;! 

(b) isolating the expressed mutant CFTR 
polypeptide. 
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17. A method for producing a mutant CFTR polypeptide 
comprising the steps of: 

(a) culturing a host cell transfected by the vector 
of claim 8 in a medium and under conditions favorable for 
expression of the mutant CFTR polypeptide selected from 
the group having mutant DNA sequence positions 129, 556, 
621+1, 711+1, 1717-1 and 3659; 

(b) isolating the expressed mutant CFTR 
polypeptide. 

18. A mutant CFTR polypeptide substantially free of 
other human proteins and encoded by the DMA sequence 
recited in claim 3. 

19. A substantially pure mutant CFTR polypeptide 
according to claim 18 made by chemical or enzymatic 
peptide synthesis. 

20. A polypeptide coded for by expression of a DNA 
sequence recited in claim 3. 

21. A method for screening a subject to determine if 
said subject is a CF carrier or a CF patient comprising 
the steps of: 

providing a biological sample of the subject to be 
screened; and providing an assay for detecting in the 
biological sample, the presence of at least a member from 
the group consisting of a mutant CF gene, a mutant CFTR 
polypeptide products and mixtures thereof, the mutants 
being defined by mutations at protein positions 85, 148, 
178, 455, 493, 507, 542, 549, 551, 560, 563, 574, 1077 
and 1092. 

22. A method for screening a subject to determine if 
said subject is a CF carrier or a CF patient comprising 
the steps of: 



WO 91/10734 



PCT/CA91/00009 



125 

providing a biological sample of the subject to be 
screened; and providing an assay for detecting in the 
biological sample, the presence of at least a member from 
the group consisting of a mutant CF gene, a mutant CFTR 
5 polypeptide products and mixtures thereof, the mutants 
being defined by mutations at DNA sequence positions 129, 
556, 621+1, 711+1, 1717-1 and 3659. 

23. The method of claim 21 or 22 wherein the biological 
10 sample includes at least part of the genome of the 

subject and the assay comprises an hybridization assay. 

24. The method of claim 23 wherein the assay further 
comprises a labelled nucleotide probe according to claim 

15 7. 

25. The method of claim 24 wherein said probe comprises 
the nucleotide sequence of claim 8. 

20 26. The method of claim 21 or 22 wherein the biological 
sample includes a CFTR polypeptide of the subject and the 
assay comprises an immunological assay. 

27. The method of claim 26 wherein the assay further 
25 includes an antibody specific for said mutant CFTR 

polypeptide. 

28. The method of claim 26 wherein the assay is a 
radioimmunoassay • 

30 

29. The method of claim 27 wherein the antibody is at 
least one monoclonal antibody. 



30. The method of claim 21 or 22 wherein the subject is 
a human fetus in utero . 
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31. The method of claim 24 wherein the assay further 
includes at least one additional nucleotide probe 
according to claim 7. 

32. The method of claim 31, wherein the assay further 
includes a second nucleotide probe comprising a different 
DNA sequence fragment of the DNA of Figure 1 or its RNA 
homologue or a different DNA sequence fragment of human 
chromosome 7 and located to either side of the DNA 
sequence of Figure 1. 



33. 



20 



In a process for screening a potential CF carrier or 
patient to indicate the presence of an identified cystic 
fibrosis mutation in the CF gene, said process including 
15 the steps of: 

(a) isolating genomic DNA from said potential CF 
carrier or said potential patient; 

(b) hybridizing a DNA probe onto said isolated 
genomic DNA, said DNA probe spanning a mutation in said 
CF gene wherein said DNA probe is capable of detecting 
said mutation, said mutation being selected from the 
group of mutations at protein positions 85, 148 i 78 
455, 493, 507, 542, 549, 551, 560, 563, 574, 1077 and 
1092; 

25 (c) treating said genomic DNA to determine presence 

or absence of said DNA probe and thereby indicating in 
accordance with a predetermined manner of hybridization 
the presence or absence of said cystic fibrosis mutation 



30 



34. in a process for screening a potential CF carrier or 
patient to indicate the presence of an identified cystic 
fibrosis mutation in the CF gene, said process including 
the steps of: 

(a) isolating genomic DNA from said potential CF 
35 carrier or said potential patient; 

(b) hybridizing a DNA probe onto said isolated 
genomic DNA, said DNA probe spanning a mutation in said 
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25 



30 



CP gene wherein said DNA probe is capable of detecting 
said mutation, said mutation being selected from the 
group of mutations at DNA sequence positions 129 , 556 
621+1, 711+1, 1717-1 and 3659. 



35 



35. 



A process for detecting cystic fibrosis carriers of 
a mutant CF gene wherein said process consists of 
determining differential mobility of heteroduplex PGR 
products in polyacrylamide gels as a result of deletions 
or alterations in the mutant CF gene at one or more of 
the protein positions 85, 148, 178, 455, 493, 507, 542, 
549, 551, 560, 563, 574, 1077 and 1092. 

36. A process for detecting cystic fibrosis carriers of 
15 a mutant CF gene wherein said process consists of 

determining differential mobility of heteroduplex PCR 
products in polyacrylamide gels as a result of deletions 
or alterations in the mutant CF gene at one or more of 
the DNA sequence positions 129, 556, 621+1, 711+1, 1717-1 
20 and 3659. 

37. A kit for assaying for the presence of a mutant CF 
gene by immunoassay comprising: 

(a) an antibody which specifically binds to a gene 
product of a mutant CF gene having a mutation at a 
protein position selected from the group consisting of 
protein positions 85, 148, 178, 455, 493, 507, 542, 549 
551, 560, 563, 574, 1077 and 1092; 

(b) reagent means for detecting the binding of the 
antibody to the gene product; and 

(c) the antibody and reagent means each being 

present in amounts effective +« nf >^« . . 

e«eccive to perform the immunoassay. 



38. 



A kit for assaying for the presence of a mutant CF 
gene by immunoassay comprising: 

(a) an antibody which specifically binds to a gene 
product of a mutant CF gene having a mutation at a dna 
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sequence position selected from the group consisting of 
DNA segence positions 129, 556, 621+1, 711+1, 1717-1 and 
3659; 

(b) reagent means for detecting the binding of the 
5 antibody to the gene product; and 

(c) the antibody and reagent means each being 
present in amounts effective to perform the immunoassay. 

39. The kit of claim 37 or 38 wherein said reagent means 
10 for detecting binding is selected from the group 

consisting of fluorescence detection, radioactive decay 
detection, enzyme activity detection or colorimetric 
detection. 

15 40. A kit for assaying for the presence of a CF gene by 
hybridization comprising: 

(a) an oligonucleotide probe which specifically 
binds to a mutant CF gene; 

(b) reagent means for detecting the hybridization 
20 of the oligonucleotide probe to a mutant CF gene having a 

mutation at a protein position selected from the group 
consisting of protein positions 85, 148, 178, 455, 493, 
507, 542, 549, 551, 560, 563, 574, 1077 and 1092; and 

(c) the probe and reagent means each being present 
25 in amounts effective to perform the hybridization assay. 

41. A kit for assaying for the presence of a CF gene by 
hybridization comprising: 

(a) an oligonucleotide probe which specifically 
30 binds to a mutant CF gene; 

(b) reagent means for detecting the hybridization 
of the oligonucleotide probe to a mutant CF gene having a 
mutation at a DNA sequence position selected from the 
group consisting of DMA sequence positions 129, 556, 

35 621+1, 711+1, 1717-1 and 3659; and 

(c) the probe and reagent means each being present 
in amounts effective to perform the hybridization assay. 
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42. An animal comprising a heterologous cell system 
comprising a recombinant cloning vector of claim 9 which 
induces cystic fibrosis symptoms in said animal. 

43. The animal of claim 42 wherein said animal is a 
mammal • 



44. The animal of claim 43 
rodent. 

45. The animal of claim 44 
mouse. 



wherein said mammal is a 



wherein said rodent is a 



15 



20 



46. in a polymerase chain reaction to amplify a selected 
exon of a cDNA sequence of Figure 1, the use of 
oligonucleotide primers from intron portions near the 5' 
and 3' boundaries of the selected exon of Figure 18. 

47. m a polymerase chain reaction of claim 46, the use 
of oligonucleotide primers xi-5 and xi-3 of Table 5 where 
X is the exon number 1, 3f 4, 5, 6a, 6b, 7 through 13, 
14a, 14b, 15 and 16, 17a, 17b and 18 through 24. 
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tt FIG1 

MTTGGAAGCAMTGACATCACAGCAGGTCAGAGAAAAAGGGTTGAGCGGCAGGCACCCA 



121 



181 



24] 



301 



361 



42] 



481 



541 



601 



661 



721 



781 



84] 



901 



961 



1021 



1081 



114] 



1201 



1261 



1321 



GAGTAGTAGGTCTTTGGCATTAGGAGCTTGAGCCCAGACGGCCCTAGCAGGGACCCCAGC 

MQ RSPLEKASVVSKLF 
GCCCGAGAGACCATGCAGAGGTCGCCTCTGGAAAAGGCCAGCGTTGTCTCCAAACTTTTT 



F 3 w 

rTCAQCTGG 



TRP ILRKGYRQRLELSD 
TTCAOCTGGACCAGACCAATTTTGAGGAAAGGATACAGACAGCGCCTGGAATTGTCAGAC 



WDREL ASKKNPKL I NALRRC 
TGGGATAGAGAGCTGGCTTCAAAGAAAAATCCTAAACTCATTAATGCCCTTCGGCGATGT 

F F W RlFMFYGTFLYLGiEVTK"D 
TTTTTCTGGAGATTTATGTTCTATGGAATCTTTTTATATTTAGGGfcAAGTCACCAAAGCA 



F S L 

TTTAGTTTGATTTATAAGAAGhCTTTAAAGCTGTCAAGCCGTGTOCTAGATAAAATAAGT 



I G 0 L V S LLSNNLNKFD 
ATTGGACAACTTGTTAGTCTCCTTTCCAACAACCTGAACAAATTTGATGi 

I L A 



e|g[ 

AAbGil 



ACTTGCA 



JL 



V W I APLOVAT. T, M 



L I 1 W 



WEEAME KMIEKLRd' 
TGGGAAGAAGCAATGGAAAAAATGATTGAAAACTTAAGACARA( 



AVQTWY DSLGAIN 
GCTGTACAAACATGGTATGACTCTCTTGGAGCAATAAACAAAATACActATTTCTTACAA 



VTA 

GTAACAGOrTTCTGGGAGGAGl^ATTTGGGGAATTATTTGAGAAAGCAAi^ 



16 



36 



IYQIPSVDSADNLSEKLErfE 56 
ATATACCAAATCCCTTCTGTTGATTCTGCTGACAATCTATCTGAAAAATTGGAAAQAGAA 



76 



96 



116 



I V 0 P L L L I GRIIASYDPDNKEE 
GTACAGCCTCTCTTACTGGGAAGAATCATAGCTTCCTATGACCCGGATAACAAGGAGGAA 

RlSIAIYLGTGT. P LLFTVRTLl 136 
CGCTCTATCGCGATTTATCTAGGCATAGGCTTATGCCTTCTCTTTATTGTGAGGACACTG 

I L L I HPAI FGLHHIGMQMRIAM 156 
CTCCTACACCCAGCCATTTTTGGCCTTCATCACATTGGAATGCAGATGAGAATAGCTATG 



I YK KlTLKLSSRVL DK IS 176 

ik fr- — — 



L ftl 196 



216 



TTGGCACATTTCGTGTGGATCGCTCCTTTGCAACTGGCA 

ELL Ql A S A F C G L G F T. I y t. ft l Fl 236 
GAGTTGTTACAGGCGTCTGCCTTCTGTGGACTTGG^ 

1 P A G L Gl RMMMKYRDQRAGKIS 256 
CAGGCTGGGCTAGGGAGAATGATGATGAAGTACAGAGATCAGAGAGCTGGGAAGATCAGT 

ERLVITSEMIENIQSVKAYC 276 
GAAAGACTTGTGATTACCTCAGAAATGATTGAAAATATCCAATCTGTTAAGGCATACTGC 



T E L K L T 296 
CAGAACTGAAACTGACT 



RKAAYVRYFN S iSAPFFSGFFi 316 
CGGAAGGCAGCCTATGTGAGATACTTCAATAGCTCAGCCTTCTTCTTCTCAGGGTTCTTT 

IVVFLSVLPVAI. tI G T T L R K Tl 336 
GTGGTGTTTTTATCTGTGCTTCCCTATGCACTAATCAAAGGAATCATCCTCCGGAAAATA 

IFTTISFCIVT. ft M A V I T R Q F P W 356 
TTCACCACCATCTCATTCTGCATTGTTCTGCGCATGGCGGTCACTCGGCAATTTCCCTGG 



K I Q I D F L Q 376 



KQEYKTLEYNLTTTEVVMEN 396 
AAGCAAGAATATAAGACATTGGAATATAACTTAACGACTACAGAAGTAGTGATGGAGAAT 



FWEElGFGELFEKAKONN 416 
pG? 
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FIG. 1 (cont'd) 

N G D D 



SLA 



TCTTTAGCAAGI 



Ik 



X V Y X D A 



LPS 



GCAGTATACAAAGATGCTGATTTGTATTTATTAGACTCTCCTTTTGGA 



YLPVLTEK RIFE 



C V C K L M A 
TACCTAGATGTTTXAACAGAAAAAGAAATATTTGAAAcfcTGTGTCTGTAAACTGATGGCT 



1 



436 



AACAATAGAAAAACTTCTAATGGTGATGACAGCCTCTTCTTCAGTAATTTCTCACTTCTT 

GTPVLKD INFKIERCQLL A V 4S6 

GGTACTCCTGTCCTGAAAGATATTAATTTCAAGATAGAAAGAGGACAGTTGTTGGCGGTT 

ACSTCAGKlTSL LMMlMGRLE 476 

GCTGGATCCACTGGAGCAGGCAADACTrCACTTCTAATGATGATrATGGGAGAACTGGAG 

JE S — £ GKIKH3 CRI S rgfl Q p s W 4 9fi 

CCTTCAGAGGGTAAAATTAAGCACAGTGGAAGAATTTCA TTCTGTTCTCAGTTTTCCTGG 
• A 
IMPGTIK gKIlFGVSYn g Y R 

ATTATGCCTGGCACCATTAAAGAAAATATCAT CTTTGGTGTTTCCTATGATGAATATAGA 

YR3VIKACQLE E J p I S y y » K 
TACAGAAGCGTCATCAAAGCATGCCMCTAGAAGA(J3ACATCTCC MGTTTGCAGAGAAA 

PNIVtOEGCI TLflCGOn ART S56 

GACAATATAGTTCT^ 356 



516 



576 



596 



NKTRILVTSKMEHLKKADKI 616 
AACAAAACTAGGATTCTGGTCACTTCTAAAATGGAACATTTAAAGAAAGCTGACAAAATA 

LILHEGSSYFYGTFSELQNL 636 
TTAATTTTGCATGAAGGTAGCAGCTATTTTTATGGGACJITTTTCAGAACTCCAAAATC^A 

QPPFSSKLMGCPSFDQFSAE 656 
CAGCCAGACTTTAGCTCAAAACTCATGGGATGTGATTCTTTCGACCAATrTAGTGCAGAA 

RRNSILTETLHRFSLEGDAP 676 
AGAAGAAATTCAATCCTAACTGAGACCTTACACCGTTTCTCATTAGJ^GGAGATGCTCCT 

VSWTETKKQ?FKQTGEFGEK 696 
GTCTCCTGGACAGAAACAAAAAAACAATCTTTTAAACAGACTGGAGAGTTTGGGGAAAAA 

RKNS I L N P I N S I RKF5 IVQK 716 
AGGAAGAATTCTATTCTCAATCCAATCAACTCTATACGAAAATTTTCCATTGTGCAAAAG 

TPLQMNGIEEDSDEPLERRL 736 
ACTCCCTTACAAATGAATGGCATCGAAGAGGATTCTGATGAGCCTTTAGAGAGAAGGCTG 
o 

SLVPDSEQGEAILPRISVIS 756 
TCCTTAGTACCAGATTCTGAGCAGGGAGAGGCGATACTGCCTCGCATCAGCGTGATCAGC 

TGPTLQARRRQSVLNLMTHS 776 
ACTGGCCCCACGCTTCAGGCACGAAGGAGGCAGTCTGTCCTGAACCTGATGACACACTCA 

VNQGQNI HRKTTASTRKVSL 796 
GTTAACCAAGGTCAGAACATXCACCGAAAGACAACAGCATCCACACGAAAAGTGTCACTG 

APQANLTELD I YSRRLSQET 816 
GCCCCTCAGGCAAACTTGACTGAACTGGATATATATTCAAGAAGGTTATCTCAAGAAACT 
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FIG.1 (cont'd) 



GLEISEEI NEEDLKlECFFDD 836 
GGCTTGGAAATAAGTGAAGAAATTAACGAAGAAGACTTAAAqGAGTGCCTTTTTGATGAT 

MESIPAVTT WNTYLRY ITVH 856 
ATGGAGAGCATACCAGCAGTGACTACATGGAACACATACCTTCGATATATTACTGTCCAC 



K S 



lGfvt, t w c t, VTFT. A E v 
AAGAGCTTAATTTTTGTGCTAATTTGGTGCTTAGTAATTTTTCTGGCAGAGGTGGCTGCT 



I S L V YD L WLLGrfTPLQDKGN, 
TCTTTGGTTGTGCTGTGGCTCCTTGGAAAfcACTCCTCTTCAAGACAAAGGGAATAGTACT 



S T 



876 



896 



HSRNNSYAVI ITSTS Is Y V V F I 916 
C ATAGTA GAAATAACAGCT ATGC AG TGATTATCACCAGCACCAGTTCGTATTATGTGTTT 



lY I YVGVADT LLAMGFlH R G L P 
TACATTTACGTGGGAGTAGCCGACACTTTGCTTGCTATGGGATTCTTCAGAGGTCTACCA 



936 



LVHTLITVSK ILHHKMLHSV 956 
CTGGTGCATACTCTAATCACAGTGTCGAAAATTTTACACCACAAAATGTTACATTCTGTT 

LQAPMSTLNTLKAbGILNRF 976 
CTTCAAGCACCTATGTCAACCCTCAACACGTTGAAAGCAGGTGGGATTCTTAATAGATTC 



SKDIAILDDLLPL T I I F D F I ol 
TCCAAAGATATAGCAATTTTGGATGACCTTCTGCCTCTTACCATATTTGACTTCATCCAq 

iLLLIVIGAIAVVAVlJ Q p| Y I Fl 1 
TTGTTATTAATTGTGATTGGAGCTATAGCAGTTGTCGCAGTTTXACAACCCTACATCTTT 



996 



016 



IV A 



T V P_ V TVAFTMT. P AVPT.t q T 1Q36 



GTTGCMCAGTGCCAGTGATAGTGGCTTTTATTATGTTGAGAGCATATTTCCTCCAAACC 

SQQLKQLE SEGRSP IFTHLV 1056 
TCACAGCAACTCAAACAACTGGAATCTGAAGGCAGGAGTCCAATTTTCACTCATCTTGTT 

TSLKGLWTLRAFGRQPYFET 1076 
ACAAGCTTAAAAGGACTATGGACACrrCGTGCCTTCGGACGGCAGCCTTACTTTGAAACT 

LFHKALNLHTANWFLYLSTL 1096 
CTGTTCCACAAAGCTCTGAATTTACATACTGCCAACTGGTTCTTCT 

F 0 M R | I 



R W 



CGCTGGTTCCAAATGAGA ATAGAAATGATOTTTGTCATCTTCTT 

I I 8 I L T _LJ£j E g e G f1 v c t t r, t t. &l u 36 



SILT 
ATTTCCATTTTAACAACAi 



£|gA< 



gaaggagaaggaagagttggtattatcctgactttagcc 



IMNIMSTI. OWAV NS Sl I D V D S Ll 1156 

atgaatatcatgagtacattgcagtgggctgtaaactccagcatagatgtggatagcttd 

MRSVSRVFKFIDMPTEGKPT 1176 
ATGCGATCTCTGAGCCGAGTCTTTAAGTTCATTGACATGCCAACAGAAGGTAAACCTACC 

KSTKPYKNGQLSKVMIIENS 1196 
AAGTCAACCAAACCATACAAGAATGGCCAACTCTCGAAAGTTATGATTATTGAGAATTCA 



W PSGGQMTVK # 



H VKKD D I w f 5 g GQMTVK D L T 
CACGTGAAGAAAGATGACATCTGGCCCTCAGGGGGCCAAATGACTGTCAAAGATCTCACA 

A K Y T g GGW AILRKISra X a P 1236 

GCAAAATACACAGAAGGTGGAAATGCCATATTAGAGAACATTTCCTTCTCAATAAGTCCT 



gorI vgllgrtcsc 
jCt 



ggccagagg< 



K S T J, L S A 1256 



;tgggcctcttgggaagaactggatcagggaagagtactttgttatcagct 
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FLRLLNTECEIQIPGVSWDS 1276 
TTTTTGAGACTACTGAACACTGAAGGAGAAATCCAGATCGATGGTGTGTCTTGGGATTCA 

ITLQQKRKAF GVIPOKVPTy 1296 
ATAACTTTGCAACAGTGGAGGAAAGCCTTXGGAGTGATACCACAGAAAGTATTTATTTTT 



SGTFRKNLDPYEQWSDQEIW 1316 
TCTGGAACATTTAGAAAAAACTTGGATCCCTATGAACAGTGGAGTGATCAAGAAATATGG 

K V A D ElV CLRSVIEOFPGICLP 1336 
AAAGTTGCAGATGAGtTTGGGCTCAGATCTGTGATAGAACAGTTTCCTGGGAAGCTTGAC 

FVLVPGGCVItSHGHKQLMCL 1356 
TTTGTCCTTGTGGATGGGGGCTGTGTCCTAAGCCATGGCCACAAGCAGTTGATGTGCTTG 

ARgVLSKAKILLLP EPSAHI. 1376 
GCTAGATCTGTTCTCAGTAAGGCGAAGATCTTGCTGCTTGATGAACCCAGTGCTCATTTG 

PPVl TYQI I R R TLKQAFADCT 1396 



GATCCAGTfUVCATACCAAATAATTAGAAGAACTCTAAAACAAGCATTTGCTGATTGCACA 

V I I C E H R I EAMLECQQF L|V I 1416 
GTAATTCTCTGTGAACACAGGATAGAAGCAATGCTGGAATGCCAACAATTTTTGbTCATA 

EENKVRQYDSIQKLLNERSL 1436 
GAAGAGAACAAAGTGCGGCAGTACGATTCCATCCAGAAACTGCTGAACGAGAGGAGCCTC 

F R Q A I SP SDRVKLFPHRNSS 1456 
TTCCGGCAAGCCATCAGCCCCTCCGACAGGGTGAAGCTCTTTCCCCACCGGAACTCAAGC 

KCKSKPQIAALKEETEEEVQ 1476 
AAGTGCAAGTCTAAGCCCCAGATTGCTGCTCTGAAAGAGGAGACAGAAGAAGAGGTGCAA 



GATACAAGGCTTTAGAGAGCAGCATAAATGTTGACATGGGACATTTGCTCATGGAATTGG 

AGCTCGTGGGACAGTCACCTCATGGAATTGGAGCTCGTGGAACAGTTACCTCTGCCTCAG 

AAAACAAGGATGAATTAAGTTTTTTTTTAAAAAAGAAACATTTGGTAAGGGGAATTGAGG 

ACACTGATATGGGTCTTGATAAATGGCTTCCTGGCAATAGTCAAATTGTGTGAAAGGTAC 

TTCAAATCCTTGAAGATTTACCACTTGTGTTTTGCAAGCCAGATTTTCCTGAAAACCCTT 

GCCATGTGCTAGTAATTGGAAAGGCAGCTCTAAATGTCAATCAGCCTAGTTGATCAGCTT 

ATTGTCTAGTGAAACTCGTTAATTTGTAGTGTTGGAGAAGAACTGAAATCATACTTCTrA 

GGGTTATGATTAAGTAATGATAACTGGAAACrTCAGCGGTTTATATAAGCTTGTATTCCT 

TTTTCTCTCCTCTCCCCATGATGTTTAGAAACACAACTATATTGTTTGCTAAGCATTCCA 

ACTATCTCATTTCCAAGCAAGTATTAGAATACCACAGGAACCACAAGACTGCACATCAAA 

ATATGCCCCATTCAACATCTAGTGAGCAGTCAGGAAAGAGAACTTCCAGATCCTGGAAAT 

CAGGGTTAGTATTGTCCAGGTCTACCAAAAATCTCAATATTTCAGATAATCACAATACAT 

CCCTTACCTGGGAAAGGGCTGTTATAATCTTTCACAGGGGACAGGATGGTTCCCTTGATG 

AAGAAGTTGATATGCCTTTTCCCAACTCCAGAAAGTGACAAGCTCACAGACCTTTGAACT 

AGAGTTTAGCTGGAAAAGTATGTTAGTGCAAATTGTCACAGGACAGCCCTTCTTTCCACA 

GAAGCTCCAGGTAGAGGGTGTGTAAGTAGATAGGCCATGGGCACTGTGGGTAGACACACA 

TGAAGTCCAAGCATTTAGATGTATAGGTTGATGGTGGTATGTTTTCAGGCTAGATGTATG 

TACTTCATGCTGTCTACACTAAGAGAGAATGAGAGACACACTGAAGAAGCACCAATCATG 

MTTAGTTTTATATGCTTCTGTTTTATAATTrTGTGAAGCAAAATTTTTTCTCTAGGAAA 

TATTTATTTTAATAATGTTTCAAACATATATTACAATGCTGTATTTTAAAAGAATGATTA 

TGMTTACATTTGTATAAAATAATTTTTATAT1TGAAATATTGACTTTTTATGGCACTAG 

TATTTTTATGAAATATTATGTTAAAACTGGGACAGGGGAGAACCTAGGGTGATATTAACX 

AGGGGCCATGAATCACCTTTTGGTCTGGAGGGAAGCCTTGGGGCTGATCGAGTTGTTCCC 

CACAGCTGTATGATTCCCAGCCAGACACAGCCTCTTAGATGCAGTTCTGAAGAAGATGGT 

ACCACCAGTCTGACTGTTTCCATCAAGGGTACACTGCCTTCTCAACTCCAAACTGACTCT 

TAAGAAGACTGCATTATATTTATTACTGTAAGAAAATATCACTTGTCAATAAAATCCATA 
CATTTGTGT (A) n 
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